[R] exporting clustering results to table

2007-11-27 Thread Martin Tomko
Hello list,

the following approach did not work:

clustersA <- pam(distances, nkA, diss=TRUE);
gc();
filenameclu = paste("filenameclu", ".txt");
write.table(clustersA , file=filenameclu,sep=",");

although it worked with
clustersA <- hclust(distances, method="ward");
and a consecutive
kclassA <- cutree(clustersA, k=nkA);
filename = paste("clusters", ".txt");
write.table(kclassA,file=filename,sep=",",col.names=TRUE,row.names=TRUE);

Is there a generic method to export cluster object? I know that 
pam is different (cluster object and some more data)- how can I extract 
& export the clustering into a table with two columns, ID = 
dissimilarity matrix row, and cluster = number of the cluster?

I waas using sink to get the data, but for large matrices it involves a 
huge amount of manual formatting afterwards, let's say in excel.

Thanks many times
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exporting clustering results to table

2007-11-27 Thread Martin Tomko
Hello Haris,
no, that is not the problem. But thank you anyway. I figured that paste 
has a funny behavior.

But the object resulting from pam is complex, and cannot be cast into a 
table frame easily...

Charilaos Skiadas wrote:
> On Nov 27, 2007, at 7:41 AM, Martin Tomko wrote:
> 
>> filename = paste("clusters", ".txt");
> 
> Don't know if this relates to your problem, but because "paste" adds 
> spaces by default (since sep=" ") this would result in a file named 
> "clusters .txt", not "clusters.txt".
> 
> Haris Skiadas
> Department of Mathematics and Computer Science
> Hanover College
> 
> 
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] median of binned values

2007-12-19 Thread Martin Tomko
Dear list,
I have a vector (array, table row, whatever is best) of frequency values 
for categories (or bins), and I need to find the median category. 
Trivial to do by hand, but I was wondering if there is a means to  do it 
in R in an elegant way.

The obvious medioan(vector) returns the median frequency for the binns, 
and that is not what I want. i.e,:
 freq
cat11
cat2   10  
cat3   100  
cat4   1000
cat5   1

I want it to return cat5, instead of cat3.

Thanks a lot
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] median of binned values

2007-12-19 Thread Martin Tomko
Thank you, Chuck,
would you mind commenting a bit on the code, it is not all clear... HOw 
would you go to retrieve only the numeric value (not the category name)?
I am just starting with R, and the functionality of replicate and levels 
is not quite clear. I tried the documentation, but am not any wiser. 
What if I had a vector v <- vector(c(1,10,100,1000,1)) and wanted to 
perform it on that?

Thanks a lot
Martin


Chuck Cleland wrote:
> Martin Tomko wrote:
>> Dear list,
>> I have a vector (array, table row, whatever is best) of frequency values 
>> for categories (or bins), and I need to find the median category. 
>> Trivial to do by hand, but I was wondering if there is a means to  do it 
>> in R in an elegant way.
>>
>> The obvious medioan(vector) returns the median frequency for the binns, 
>> and that is not what I want. i.e,:
>>  freq
>> cat11
>> cat2   10  
>> cat3   100  
>> cat4   1000
>> cat5   1
>>
>> I want it to return cat5, instead of cat3.
> 
> df <- data.frame(binname = as.factor(paste("cat", 1:5, sep="")),
>  freq = c(1,10,100,1000,1))
> 
> df
>   binname  freq
> 1cat1 1
> 2cat210
> 3cat3   100
> 4cat4  1000
> 5cat5 1
> 
> with(df, levels(binname)[median(rep(as.numeric(binname), freq))])
> [1] "cat5"
> 
>> Thanks a lot
>> Martin
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code. 
> 

-- 
Martin Tomko
Postdoctoral Research Assistant

Geographic Information Systems Division
Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email:  [EMAIL PROTECTED]
site:   http://www.geo.uzh.ch/~mtomko
mob:+41-788 629 558
tel:+41-44-6355256
fax:+41-44-6356848

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting activity time intervals

2009-01-21 Thread Martin Tomko

Dear All,
I have interval data (for Mon-Sun, 00-24h) of an activity and would like 
to visually plot them in a matrix-like plot, where color A would be 
assigned to the activity, and color X to unspecified time usage. Note 
that the activities are not in standardised units (hours or so), but 
from startTime to endTime (in hrs:mins)
In principle it is a bar plot where multiple bars can be stacked one on 
top of another, with say the x axis representing time in a day, the y 
axis the day of the week, without gaps between the bars?

can anyone please suggest a way to plot these?
Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "pattern matching" accross multiple matrices

2007-11-08 Thread Martin Tomko
Hi all,

I have a set of patterns which can occur in a series of (3) matrices. I 
want to identify those and create a fourth one with the identifiers of 
the cases.

Something like:

for (i in 1:l) {
for (j in 1:w) {

A[A[i,j]==1 & D[i,j]==1 & P[i,j]==1] <- Case1;
A[A[i,j]==-1 & D[i,j]==-1 & P[i,j]==-1] <- Case2;

etc
}
}

the code seems to run, but is very slow Could anyone please suggest 
a better approach? I was thinking that 3 matrices could be stacked in a 
cube, and the column of a cube searched for a pattern, but am not sure 
how to do that...

Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] "pattern matching" accross multiple matrices

2007-11-08 Thread Martin Tomko
Hi all,

I have a set of patterns which can occur in a series of (3) matrices. I 
want to identify those and create a fourth one with the identifiers of 
the cases.

Something like:

for (i in 1:l) {
for (j in 1:w) {

A[A[i,j]==1 & D[i,j]==1 & P[i,j]==1] <- Case1;
A[A[i,j]==-1 & D[i,j]==-1 & P[i,j]==-1] <- Case2;

etc
}
}

the code seems to run, but is very slow Could anyone please suggest 
a better approach? I was thinking that 3 matrices could be stacked in a 
cube, and the column of a cube searched for a pattern, but am not sure 
how to do that...

Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] install.package("TinnR") - there is no package called 'TinnR'

2009-03-24 Thread Martin Tomko
I have troubles make TinnR 2.2.0.2 work, it seems that the dependency on 
the package TinnR that cannot be found (I tried also manual downloads, 
but I cannot find the package anywhere on any CRAN mirror).


I even set a default cran mirror in the Rprofile.site file, so that the 
later command can find it:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

No luck. Even manually issuing the comand in Rterm fails, package 
‘TinnR’ is not available.
Any idea how I could make my TinnR work? I googled extensively, but 
without luck...


Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.package("TinnR") - there is no package called 'TinnR'

2009-03-24 Thread Martin Tomko

David,
that is not helpful. I KNOW that TinnR is a standalone editor. If you 
had a look at the Rprofile.site required by TinnR, you would notice the 
part of the code I send earlier:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

These are executed by the TinnR editor upon start. The TinnR package 
MUST therefore exist, and is required.


Hope that someone else can REALLY help.

Martin

David Winsemius wrote:

Tinn-R is not an R package. It is a standalone text editor:

http://www.lmgtfy.com/?q=tinn-r


-- David Winsemius

On Mar 24, 2009, at 7:05 AM, Martin Tomko wrote:

I have troubles make TinnR 2.2.0.2 work, it seems that the dependency 
on the package TinnR that cannot be found (I tried also manual 
downloads, but I cannot find the package anywhere on any CRAN mirror).


I even set a default cran mirror in the Rprofile.site file, so that 
the later command can find it:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

No luck. Even manually issuing the comand in Rterm fails, package 
‘TinnR’ is not available.
Any idea how I could make my TinnR work? I googled extensively, but 
without luck...


Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.package("TinnR") - there is no package called 'TinnR'

2009-03-24 Thread Martin Tomko

Patrick, Romain,
thank you very much for your help!
I have found the site of the package at

http://r-forge.r-project.org/projects/tinnr/ , as Romain suggests, but when you try to download, iyou find that the package is actually not contributed to the repository, it is just a space holder for it! Patrick's 
http://cran.r-project.org/web/packages/TinnR/index.html

is better, and I will give it a try... I was just not sure abouyt the versions, 
but it seems to be a relatively recent one, so hopefully it will work.
Thanks again!
Martin




Richardson, Patrick wrote:

Hi Martin,

If all else fails, you could download the package from 
http://cran.r-project.org/web/packages/TinnR/index.html
and install as a .zip file from within R.

That is bizarre that R cannot find the package. I've had no problems 
downloading and installing.

Best regards,

Patrick





-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
Behalf Of Martin Tomko
Sent: Tuesday, March 24, 2009 8:25 AM
To: David Winsemius
Cc: r-help@r-project.org
Subject: Re: [R] install.package("TinnR") - there is no package called 'TinnR'

David,
that is not helpful. I KNOW that TinnR is a standalone editor. If you 
had a look at the Rprofile.site required by TinnR, you would notice the 
part of the code I send earlier:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

These are executed by the TinnR editor upon start. The TinnR package 
MUST therefore exist, and is required.


Hope that someone else can REALLY help.

Martin

David Winsemius wrote:
  

Tinn-R is not an R package. It is a standalone text editor:

http://www.lmgtfy.com/?q=tinn-r


-- David Winsemius

On Mar 24, 2009, at 7:05 AM, Martin Tomko wrote:


I have troubles make TinnR 2.2.0.2 work, it seems that the dependency 
on the package TinnR that cannot be found (I tried also manual 
downloads, but I cannot find the package anywhere on any CRAN mirror).


I even set a default cran mirror in the Rprofile.site file, so that 
the later command can find it:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

No luck. Even manually issuing the comand in Rterm fails, package 
'TinnR' is not available.
Any idea how I could make my TinnR work? I googled extensively, but 
without luck...


Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
  

David Winsemius, MD
Heritage Laboratories
West Hartford, CT






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
This email message, including any attachments, is for ...{{dropped:5}}


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.package("TinnR") - there is no package called 'TinnR' (RESOLVED)

2009-03-24 Thread Martin Tomko

Hi Duncan,
it is Win Xp with the 2.6.0 R-project version. Sorry, I should have 
included this before.
Installing the TinnR package manually from a local zip file downloaded 
from CRAN helped. I am still not sure why the package was not picked in 
the repositories. Can anyone please see if the package is visible to 
other under install packages in any repository?

Thanks
Martin


Duncan Murdoch wrote:

On 3/24/2009 7:05 AM, Martin Tomko wrote:
I have troubles make TinnR 2.2.0.2 work, it seems that the dependency 
on the package TinnR that cannot be found (I tried also manual 
downloads, but I cannot find the package anywhere on any CRAN mirror).


What R version are you using, on what platform?  I have no trouble 
with an automatic install of the TinnR package into 2.8.1 on Windows.


Duncan Murdoch



I even set a default cran mirror in the Rprofile.site file, so that 
the later command can find it:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

No luck. Even manually issuing the comand in Rterm fails, package 
‘TinnR’ is not available.
Any idea how I could make my TinnR work? I googled extensively, but 
without luck...


Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] install.package("TinnR") - there is no package called 'TinnR' (RESOLVED)

2009-03-24 Thread Martin Tomko
Yes, will do. I just was not in need to upgrade as everything worked... 
An R is not my primary development environment - i.e., I need it a 
couple of times a year.

But you are right, it is good to be on an updated version.
Cheers
Martin

Duncan Murdoch wrote:

On 3/24/2009 8:55 AM, Martin Tomko wrote:

Hi Duncan,
it is Win Xp with the 2.6.0 R-project version. Sorry, I should have 
included this before.
Installing the TinnR package manually from a local zip file 
downloaded from CRAN helped. I am still not sure why the package was 
not picked in the repositories. Can anyone please see if the package 
is visible to other under install packages in any repository?


Your version of R is too old.  TinnR was last updated in February this 
year and claims to support 2.6.0, but CRAN no longer builds binaries 
for 2.6.x.  (Version 2.6.0 became obsolete in November 2007 when 2.6.1 
was released, and binaries for the 2.6.x series stopped being built 
sometime last year.)


If you are set up for installing from source, you could try 
downloading the source package


http://cran.r-project.org/src/contrib/TinnR_1.0.3.tar.gz

and running

Rcmd INSTALL TinnR_1.0.3.tar.gz

but it is probably easier to update your R to the current release.

Duncan Murdoch


Thanks
Martin


Duncan Murdoch wrote:

On 3/24/2009 7:05 AM, Martin Tomko wrote:
I have troubles make TinnR 2.2.0.2 work, it seems that the 
dependency on the package TinnR that cannot be found (I tried also 
manual downloads, but I cannot find the package anywhere on any 
CRAN mirror).


What R version are you using, on what platform?  I have no trouble 
with an automatic install of the TinnR package into 2.8.1 on Windows.


Duncan Murdoch



I even set a default cran mirror in the Rprofile.site file, so that 
the later command can find it:

# check necesary packages
necessary = c('TinnR', 'svSocket')
installed = necessary %in% installed.packages()[, 'Package']
if (length(necessary[!installed]) >=1)
install.packages(necessary[!installed], dep=T).

No luck. Even manually issuing the comand in Rterm fails, package 
‘TinnR’ is not available.
Any idea how I could make my TinnR work? I googled extensively, but 
without luck...


Thanks
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] strange behavior when reading csv - line wraps

2009-05-29 Thread Martin Tomko

Dear All,
I am observing a strange behavior and searching the archives and help 
pages didn't help much.

I have a csv with a variable number of fields in each line.

I use
dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);

to read it in, and it works. But - some lines are long and 'wrap', or 
split and continue on the next line. So when I check the dim of the 
frame, they are not correct and I can see when I do a printout that the 
lines is split into two in the frame. I checked the input file and all 
is good.


an example of the input is:
37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;

where the last values occurs on the next line in the data frame.

It does not have to be the last value, as in the follwong example, the 
word "kempten" starts the next line:

39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;

What could be the reason?

I ws thinking about solving the issue by using a different separator, 
that I would use for the first 7 fields and concatenating all of the 
remaining values into a single stirng value, but could not figure out 
how to do such a substitution in R. Unfortunately, on my system I cannot 
specify a range for sed...


Thanks for any help/pointers
Martin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behavior when reading csv - line wraps

2009-05-30 Thread Martin Tomko

Jim,
the two lines I put in are the actual problematic input lines.
In these examples, there are no quotes nor # signs, although I have no 
means to make sure they do not occur in the inputs (any hints how I 
could deal with that?).
I am trying to avoid as much pre-processing outside R as possible, and I 
have to process about 500 files with up to 3000 records each, so I need 
a more or less automated/batch solution. - so any string substitution 
will have to occur in R. But for the moment, I do not see a reaason for 
substitution, and the wrapping still occurs.


Cheers
Martin



jim holtman wrote:
You need to supply the actual input line so we can see what is 
happening.  Are you sure you do not have unbalanced quotes in your 
input (try quote='') or do you have comment characters ("#") in your 
input?


On Fri, May 29, 2009 at 3:15 PM, Martin Tomko <mailto:martin.to...@geo.uzh.ch>> wrote:


Dear All,
I am observing a strange behavior and searching the archives and
help pages didn't help much.
I have a csv with a variable number of fields in each line.

I use
dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill =TRUE);

to read it in, and it works. But - some lines are long and 'wrap',
or split and continue on the next line. So when I check the dim of
the frame, they are not correct and I can see when I do a printout
that the lines is split into two in the frame. I checked the input
file and all is good.

an example of the input is:

37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;

where the last values occurs on the next line in the data frame.

It does not have to be the last value, as in the follwong example,
the word "kempten" starts the next line:

39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;

What could be the reason?

I ws thinking about solving the issue by using a different
separator, that I would use for the first 7 fields and
concatenating all of the remaining values into a single stirng
value, but could not figure out how to do such a substitution in
R. Unfortunately, on my system I cannot specify a range for sed...

Thanks for any help/pointers
Martin

__
R-help@r-project.org <mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] strange behavior when reading csv - line wraps

2009-05-31 Thread Martin Tomko

Dear Jim,
with the help of Ted, we diagnosed that the cause is in the extreme 
variability in line length during reading in. As the table column number 
is apparently determined fro mthe first five lines, what exceeds this 
length gets automatically on the next line.
I am now trying to find a way to read in the data despite this. I have 
no control over the table extent, the only thing that would make sense 
according to my data would be to read in a fixed number of columns and 
merge all remaining columns as a long string in the last one. No idea 
how to do this, though.


Thanks
Martin


jim holtman wrote:
It is still not clear to me exactly how you want to read the lines 
in.  If the lines have a variable number of fields, and some of the 
lines might be wrapped, is there some way to determine where the start 
of each line is.
 
If you are reading them in with read.csv, then the system is assuming 
that each line starts a new row.  If this is not the case, then you 
will have to state the rules that determine where the lines start.  
You can always read the data in with 'scan' to separate each line and 
then do whatever processing is required to put together the rows in a 
data frame that you want.
 
In one of your examples, you indicated that the line was split 
starting at the word "kempten"; if this is in the middle of the line, 
then you would have to create the break after reading the line in with 
'scan' and then creating the rows in the dataframe.  All of this can 
be done in R if you can state what the criteria is.
On Sat, May 30, 2009 at 4:32 AM, Martin Tomko <mailto:martin.to...@geo.uzh.ch>> wrote:


Jim,
the two lines I put in are the actual problematic input lines.
In these examples, there are no quotes nor # signs, although I
have no means to make sure they do not occur in the inputs (any
hints how I could deal with that?).
I am trying to avoid as much pre-processing outside R as possible,
and I have to process about 500 files with up to 3000 records
each, so I need a more or less automated/batch solution. - so any
string substitution will have to occur in R. But for the moment, I
do not see a reaason for substitution, and the wrapping still occurs.

Cheers
Martin



jim holtman wrote:

You need to supply the actual input line so we can see what is
happening.  Are you sure you do not have unbalanced quotes in
your input (try quote='') or do you have comment characters
("#") in your input?

On Fri, May 29, 2009 at 3:15 PM, Martin Tomko
mailto:martin.to...@geo.uzh.ch>
<mailto:martin.to...@geo.uzh.ch
<mailto:martin.to...@geo.uzh.ch>>> wrote:

   Dear All,
   I am observing a strange behavior and searching the
archives and
   help pages didn't help much.
   I have a csv with a variable number of fields in each line.

   I use
   dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill
=TRUE);

   to read it in, and it works. But - some lines are long and
'wrap',
   or split and continue on the next line. So when I check the
dim of
   the frame, they are not correct and I can see when I do a
printout
   that the lines is split into two in the frame. I checked
the input
   file and all is good.

   an example of the input is:
 
 37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;


   where the last values occurs on the next line in the data
frame.

   It does not have to be the last value, as in the follwong
example,
   the word "kempten" starts the next line:
 
 39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;


   What could be the reason?

   I ws thinking about solving the issue by using a different
   separator, that I would use for the first 7 fields and
   concatenating all of the remaining values into a single stirng
   value, but could not figure out how to do such a
substitution in
   R. Unfortunately, on my system I cannot specify a range for
sed...

   Thanks for any help/pointers
   Martin

   __

Re: [R] strange behavior when reading csv - line wraps

2009-05-31 Thread Martin Tomko

Big thanks to Ted and Jim for all the help.
Martin

(Ted Harding) wrote:

Ah!!! It was count.fields() which we had overlooked! We discoveered
a work-round which involved using 


  Data0 <- readLines(file)

to create a vector of strings, one for each line of the input file,
and then using

  NF <- unlist(lapply(R0,function(x)
length(unlist(gregexpr(";",x,fixed=TRUE,useBytes=TRUE))

to count the number of occurrences of ";" (the separator) in each line.
(NF+1) produces the same result as count.fields(file,sep=";"). 


Thanks for pointing out the existence of count.fields()!
Ted.

On 31-May-09 15:04:23, jim holtman wrote:
  

You can do something like this: count the number of fields in each line
of
the file and use the max to determine the number of columns for
read.table:

file <- '/tempxx.txt'
maxFields <- max(count.fields(file))  # max
# now setup read.table for max number
input <- read.table(file, colClasses=rep(NA, maxFields), fill=TRUE,
col.names=paste("V", seq(maxFields), sep=''))


On Sun, May 31, 2009 at 6:06 AM, Martin Tomko
wrote:



Dear Jim,
with the help of Ted, we diagnosed that the cause is in the extreme
variability in line length during reading in. As the table column
number is
apparently determined fro mthe first five lines, what exceeds this
length
gets automatically on the next line.
I am now trying to find a way to read in the data despite this. I have
no
control over the table extent, the only thing that would make sense
according to my data would be to read in a fixed number of columns and
merge
all remaining columns as a long string in the last one. No idea how to
do
this, though.

Thanks
Martin


jim holtman wrote:

  

It is still not clear to me exactly how you want to read the lines
in.  If
the lines have a variable number of fields, and some of the lines
might be
wrapped, is there some way to determine where the start of each line
is.
 If you are reading them in with read.csv, then the system is
 assuming
that each line starts a new row.  If this is not the case, then you
will
have to state the rules that determine where the lines start.  You
can
always read the data in with 'scan' to separate each line and then do
whatever processing is required to put together the rows in a data
frame
that you want.
 In one of your examples, you indicated that the line was split
 starting
at the word "kempten"; if this is in the middle of the line, then you
would
have to create the break after reading the line in with 'scan' and
then
creating the rows in the dataframe.  All of this can be done in R if
you can
state what the criteria is.
On Sat, May 30, 2009 at 4:32 AM, Martin Tomko
> wrote:

   Jim,
   the two lines I put in are the actual problematic input lines.
   In these examples, there are no quotes nor # signs, although I
   have no means to make sure they do not occur in the inputs (any
   hints how I could deal with that?).
   I am trying to avoid as much pre-processing outside R as possible,
   and I have to process about 500 files with up to 3000 records
   each, so I need a more or less automated/batch solution. - so any
   string substitution will have to occur in R. But for the moment, I
   do not see a reaason for substitution, and the wrapping still
   occurs.

   Cheers
   Martin



   jim holtman wrote:

   You need to supply the actual input line so we can see what is
   happening.  Are you sure you do not have unbalanced quotes in
   your input (try quote='') or do you have comment characters
   ("#") in your input?

   On Fri, May 29, 2009 at 3:15 PM, Martin Tomko
   mailto:martin.to...@geo.uzh.ch>
   <mailto:martin.to...@geo.uzh.ch
   <mailto:martin.to...@geo.uzh.ch>>> wrote:

  Dear All,
  I am observing a strange behavior and searching the
   archives and
  help pages didn't help much.
  I have a csv with a variable number of fields in each line.

  I use
  dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill
   =TRUE);

  to read it in, and it works. But - some lines are long and
   'wrap',
  or split and continue on the next line. So when I check the
   dim of
  the frame, they are not correct and I can see when I do a
   printout
  that the lines is split into two in the frame. I checked
   the input
  file and all is good.

  an example of the input is:
37;2175168475;13;8.522729;47.19537;16366...@n00
;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switz
erland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;touris
mus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitri
otnet;

  where the last values occurs on