Re: [R] Converting english words to numeric equivalents

2008-07-28 Thread Hans-Joerg Bibiko

How about this?

unletter - function(word) {
  gsub('-64',' ',paste(sprintf(%02d,utf8ToInt(tolower(word)) -  
96),collapse=''))

}

unletter(abc)
[1] 010203

unletter(Aw)
[1] 0123

unletter(I walk to school)
[1] 09 23011211 2015 190308151512

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting english words to numeric equivalents

2008-07-28 Thread Hans-Joerg Bibiko

On 28 Jul 2008, at 12:23, Hans-Joerg Bibiko wrote:

How about this?

unletter - function(word) {
 gsub('-64',' ',paste(sprintf(%02d,utf8ToInt(tolower(word)) -  
96),collapse=''))

}

unletter(abc)
[1] 010203

unletter(Aw)
[1] 0123

unletter(I walk to school)
[1] 09 23011211 2015 190308151512


I do not know precisely what do you want to do.

With:
as.double(unlist(strsplit(unletter(I walk to school), )))

you will get a numeric vector out of the string.
But this leads to a problem with large words like:

as.double(unlist(strsplit(unletter(schoolschool), )))
[1] 1.903082e+23

Thus I would suggest if there's a need to mirror words as numeric  
values and the numeric values haven't a meaning to parse your text in  
beforehand to build a hash (a list) of all distinct words in your text  
and assign a number to each word.

This would end up in a list à la:
words - (abc = 1, I = 2, go = 3, etc.)

After that you can access these numeric values via:
words['go']
$go
[1] 3

--Hans
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matching problem

2008-06-27 Thread Hans-Joerg Bibiko


On 27 Jun 2008, at 12:23, Tom.O wrote:



Hi R gurus
I have a matching problem that I cant solve. I have tried multiple  
solutions

and searched varius help-sites but I cant get it to work.

This is the problem
myexstrings = c(*AAA.AA,BBB BB,*.CCC.,**dd- d)

what I want do do is to remove any non-characters in the beginning and
everything else after the non-character symbol after the first set of
characters so that the string becomes:

c(AAA,BBB,CCC,dd)


I can figure out the start, sub(^\\W*,, myexstrings,perl=T) will  
remove

the unwanted beginnings but then its the rest.


Try

gsub(\\W*,, myexstrings,perl=T)

Cheers,

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] matching problem

2008-06-27 Thread Hans-Joerg Bibiko


On 27 Jun 2008, at 13:56, Tom.O wrote:



Well I have tried that and it's unfortuanally not the solution.
This return all the characters in the string, but I dont want the  
characters
after the ending non-character symbol. Only the starting characters  
ore of

interest.


gsub(\\W*,, myexstrings,perl=T)

[1] A B CCC   ddd




Oops,

try this one:

gsub(^\\W*(\\w+)\\W.*,\\1, myexstrings,perl=T)

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Similarity matching with probabilities

2008-06-27 Thread Hans-Joerg Bibiko


On 27 Jun 2008, at 14:30, francogrex wrote:



Hello,
It's just a strange coincidence that someone posted just very  
recently a
question about matching. I know there are several match function in  
the base
package (such as match, pmatch, charmatch, and the gsub etc)  but I  
can't

seem to use them wisely to be able to get what I need.
suppose I have the following strings:
tets
estt
rtes7
gstes
tes5t

Is there an R procedure to determine how related each string is to the
reference string test, for example to say that tets is similar  
to test

with a probability of 0.9 or something of that sort?


Have a look at ?agrep.
One could loop for different max.distances to get the relation.

An other way is to calculate the edit distance by Levenshtein(- 
Damerau). A starting point could be :


http://wiki.r-project.org/rwiki/doku.php?id=tips:data-strings:levenshtein

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] avoid using Dingbats symbols for points()

2008-06-25 Thread Hans-Joerg Bibiko

Hi,

I came across with a tiny problem.

E.g.:

pdf()
plot(1:5)
points(2, 3, cex=10, pch=21, bg=grey, lwd=0.3)
points(2, 4, cex=1,  pch=21, bg=grey, lwd=0.3)
dev.off()


If I execute this I'll get a nice PDF. Fine.
But I want to edit this PDF with let's say by using Adobe Illustrator.  
If I try to open it Illustrator shows up an error message:


Missing Type 1 fonts have been substitute with the default font.
Fonts with foreign encodings have been reencoded.

I can press OK and I get an image which shows the letter 'l' instead  
of the points except for the first points() statement.

Then I read in ?pdf:

[...]
- Circle of any radius are allowed. Opaque circles of less than 10  
points radius are rendered using char 108 in the Dingbats font: all  
semi-transparent and larger circles using a Bézier curve for each  
quadrant.


OK. But is there a way to avoid replacing a circle less than 10pt  
radius by a Dingbats font? In other word I want to have a Bézier curve  
as well.


Up to now I use a very stony way à la:
[draw a pie chart with only one segment]

stars(matrix(data=1, ncol=1), draw.segments = TRUE, scale = FALSE,  
radius = FALSE, locations = c(2, 3.2), col.segments = grey, add =  
TRUE, labels = NULL, len=0.05, lwd = 0.1)


The only thing I have to do is to delete an anker point for the  
segment. But if the plot has hundreds of points, well ...



I tried it out with R 2.6.2 and R 2.7 on Mac OSX 10.5.3 and Windows  
XP; always the same.


I would be appreciate for any hints.


Thanks in advance!

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] avoid using Dingbats symbols for points()

2008-06-25 Thread Hans-Joerg Bibiko


On 25 Jun 2008, at 15:22, Prof Brian Ripley wrote:

Please look at the NEWS for R-devel, which was an option to work  
around this known bug in Adobe Illustrator.


Thanks a lot for the hint.


(Of course, the R posting guide suggested this for before posting.)


I looked at several mailing lists and docs etc. But I didn't check the  
new features of R 2.8.

I pledge betterment.

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to return multiple values in a function

2008-06-23 Thread Hans-Joerg Bibiko


On 23 Jun 2008, at 10:23, Gundala Viswanath wrote:

I apologize for this newbie question. But I can't seem
to find in R online manual.

1. How can I return two values in a function?
2. How can I capture the values again of this function?

myfunc - function (array) {

  # do something with array
  # get something assign to foo and bar
  print(foo)
  print(bar)

 # how can I return foo and bar ?
}

# Is this the way to capture it?

(nfoo,nbar) - myfunc(some_array)



One way would be :

myfunc - function (array) {

  # do something with array
  # get something assign to foo and bar
  result - c(foo, bar)
  return(result)
 # how can I return foo and bar ?
}

res - myfunc(some_array)
res[1]
[1] foo.stuff
res[2]
[1] bar.stuff


--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create a new vector with filled with mean value of original vector

2008-06-23 Thread Hans-Joerg Bibiko


On 23 Jun 2008, at 10:47, Gundala Viswanath wrote:


Hi,

Given this vector:

x - c(30.9, 60.1  , 70.0 ,  73.0 ,  75.0 ,  83.9 ,  93.1 ,   
97.6 ,  98.8 , 113.9)

[1]  30.9  60.1  70.0  73.0  75.0  83.9  93.1  97.6  98.8 113.9


mean.x - mean(x)

[1] 79.63

I wish to:

1. Create a new vector (nx) with the same size as x
2. Fill nx with the mean value

thus in the end I hope to get something like:

[1] 79.63 79.63 79.63 79.63 79.63 79.63 79.63 79.63 79.63 79.63


One way would be:

rep(mean(x),length(x))

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] paste data

2008-06-18 Thread Hans-Joerg Bibiko


On 18 Jun 2008, at 10:36, Sybille Wendel wrote:

I need a command.
I have a lot of data in different dataframes(auto.0a, auto.0b, auto. 
0c, auto.5Na,...), that has similar names.


I could print the names all at once wih a loop with the command  
paste(), see below:


plot- 
c(0a,0b,0c,5Na,5Nb,5Nc,PKa,PKb,PKc,5NPKa,5NPKb,

5NPKc,10NPKa,10NPKb,10NPKc,20NPKa,20NPKb,20NPKc)

for (x in 1:length(plot))
{
name-paste(auto.,plot[x],sep=)
print(name)
}



First of all, maybe it is better to avoid to name a variable 'plot'.  
It works, but it could be a bit confusing.


You can do this easier (paste can handle vectors etc.):
name-paste(auto., plot, sep=)


I want to do very similar things with all the dataframes and their  
structure is also the same.
Is there a way to write a loop? (so that I don't have to write the  
same 18 times)

I tried things like that:

for (x in 1:length(plot))
{
	plot(paste(auto.,plot[x],sep=)[,1],paste(auto.,plot[x],sep=) 
[,2],col=...)

}



paste(auto.,plot[x],sep=)[,1]  = doesn't work.

Assuming that 'auto.0a' is a data.frame you should use

get(paste(auto,.0a,sep=''))[,1]

instead to get the first column of the data.frame 'auto.0a'

Maybe try:

plot(get(paste(auto.,plot[x],sep=))[,1],  
get(paste(auto.,plot[x],sep=))[,2], col=...)


--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] exit function in R?

2008-06-02 Thread Hans-Joerg Bibiko


On 2 Jun 2008, at 15:18, Federico Abascal wrote:


Hi,
This is likely an stupid question, but I cannot find the solution.
I am searching for an exit function, to end the execution of an R
script if some condition is not filled.
Any clue?



f - function() {
  ...
  if (1 == 1) return(WHATEVER)
  ...
  ...
  return(ONSUCCESS)
}

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Modify string-regular expression

2008-05-30 Thread Hans-Joerg Bibiko

On 30 May 2008, at 11:25, Romain wrote:

...

SCAN - scan(File.txt,sep=\n,  what=raw,blank.lines.skip=F)
For (i in 1:Nb_param)
{
   sub('Param[i] = Value_i-1','Param[i] = Value_i-2', SCAN)
}

...

I Know how to modify a string with sub when it is a fixed string :  
sub((K =)([0-9]*),paste(\\1, Value[i,2]),SCAN)
But i would like to know if it is possible to use the function paste  
or something else in the first argument of the function sub.

For example, the correct syntax of :
' sub((Param[i])([0-9]*),paste(\\1, Value[i,2]),SCAN) '


Have a look at ?gsub

Cheers,

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Unicode characters (R 2.7.0 on Windows XP SP3 and Hardy Heron)

2008-05-30 Thread Hans-Joerg Bibiko

Quoting Duncan Murdoch [EMAIL PROTECTED]:


On 5/30/2008 12:58 PM, Hans-Jörg Bibiko wrote:
to put it simply. Windows cannot handle utf-8 data. There is no   
utf-8  locale available.


Code page 65001 is utf-8.  Most text editors (including Notepad)
include an option to save in the UTF-8 encoding.

Some programs don't fully support utf-8 (some don't even support the
native UCS-2), but most don't care.  That's the nice thing about utf-8.

So in what sense can Windows not handle utf-8 data?


Of course, you're right. I only meant in that context R for Windows,  
not Windows at all. Sorry for my incorrectness.


--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Creating/Concatenate Strings into another String

2008-05-29 Thread Hans-Joerg Bibiko

On 29 May 2008, at 10:39, Gundala Viswanath wrote:

Is there a way to do it?

For example I tried this:

args-commandArgs()
fname - args[6].-.args[9]

This would work under Perl :)

Look for details: ?paste

Try this:

fname - paste(args[6], ., args[9], sep=)

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-22 Thread Hans-Joerg Bibiko

On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:

 Is it possible to download a compiled snapshot of 2.7.0 for Windows  
 XP?
 Yes, http://cran.r-project.org/bin/windows/base/rtest.html
 And it is due for release tomorrow.

I played with 2.7.0 on Windows XP. I can do things which couldn't be  
done with 2.6.x. Many many thanks for the effort!!!

But, I always came to a point where I didn't find a solution, due to  
the fact that Windows has no UTF-8 locale(s).
Has Windows Vista UTF-8 locales?
If I'm dealing with known languages I'm able to get rid of a lot of  
things.

But my/our problem is that we have to deal with different languages at  
the same time [in a data.frame]. Furthermore I/we have to deal with  
IPA symbols, which haven't a locale; and grep, strsplit, etc. are set  
up on top of the chosen locale. Thus I'm not able to use strsplit on a  
string which contains German, Russian, IPA-symbols, because all glyphs  
which are not part of the chosen locale are displayed [e.g. as output  
of strsplit()] literally as U+.

That's why the only solution is to use an UTF-8 environment (OS) or  
for hard-liners to transform each glyph into numbers and to do  
research on that numbers (which is really annoying ;).

Unfortunately at this point I have to give up. Maybe there is someone  
who can give me further advice with Windows.
The only thing, maybe, I have in mind is to use Perl, Python etc. in  
beforehand to manipulate the data before the data are analyzed using R.


--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Hans-Joerg Bibiko
Dear all,

is it possible to set up RGUI or JGR on Windows PC to UTF-8 encoding?

I looked for it in mailing lists and in the documentation, but I  
couldn't figure out it.

My problem is e.g. to split a given string containing German and  
Russian words into characters.
example:

  a - asdШas
  strsplit(a,NULL)
[[1]]
[1] a s d Ш a s

works on each Mac or Linux computer, but I didn't find a way for  
Windows.

I tried to set options(encoding) to UTF-8, I tried to use the Perl  
mode in strsplit, but I had no success. At least by using JGR I was  
able to type Russian and see my text correctly but strsplit failed.

I set RGUI to a Unicode font, no success.

I tried to save a script file in UTF-8 or UTF-16 and I tried to run  
source(FILE, encoding=***), no success.

Is there really no way to use a Windows PC and R to work with Unicode  
texts?

Many thanks in advance for each hint,

--Hans
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Hans-Joerg Bibiko
On 21 Apr 2008, at 11:33, Prof Brian Ripley wrote:
 You didn't tell us your R version (or your locale).  Windows has no  
 UTF-8 locales, so a lot of work has had to be done to allow Unicode  
 chars to be handled on Windows.
It was more or less a general question on R running on Windows PCs.
Normally I'm using R on a Mac or Linux. But some of my students asked  
for the Unicode support for Windows' RGUI.

 Please look into 2.7.0 RC, and in particular its CHANGES file at

 https://svn.r-project.org/R/branches/R-2-7-branch/src/gnuwin32/CHANGES
These are really good news!
I would like to express my gratitude toward anyone who was/is involved  
in that development!


Is it possible to download a compiled snapshot of 2.7.0 for Windows XP?

Thanks a lot,

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] UTF-8 or Unicode on Windows PC

2008-04-21 Thread Hans-Joerg Bibiko

On 21 Apr 2008, at 12:33, Prof Brian Ripley wrote:

 Is it possible to download a compiled snapshot of 2.7.0 for Windows  
 XP?

 Yes, http://cran.r-project.org/bin/windows/base/rtest.html
 And it is due for release tomorrow.

Many thanks! I can see the progress :)

But please forgive my incompetence. I'm not so familiar with Windows.
If I start e.g. RGUI by using: Rgui.exe LC_CTYPE=ja I can type  
Japanese, Russian, and German. strsplit works perfectly! ;)
But if I type for instance a German umlaut 'ü' it comes out as 'u'.  
OK, it is due to the fact I didn't set up Rgui in UTF-8 mode.
But how can I do this? My data are written in many different  
languages, and I want to do some statistics.

R version 2.7.0 RC (2008-04-19 r45391)
i386-pc-mingw32

locales:
all to German_Germany.1252
LC_CTYPE=Japanese_Japan.932

###

There are some minor issues.
I set Rgui's font to Arial Unicode. This works but I have some  
troubles to place my cursor, caused by the issue that Arial Unicode is  
not a monospaced font.

If I start up Rgui in German, I can see the localized menu items, but  
for each non-ASCII character I see cryptic things. It seems to me that  
the localized strings are written in UTF-8, and Rgui expects ANSI  
characters.

###
Nevertheless, thanks a lot!

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] a question of alphabetical order

2008-04-16 Thread Hans-Joerg Bibiko
Hi,

as already mentioned, sorting could be a pain.

My solution to that is to write my own order routine for a given  
language.
The idea is to transform the UTF-8 string into ASCII in such a way  
that the built-in order routine outputs the desired result. But this  
could be a very stony way.

Example for Spanish (please correct me if I'm wrong):
-accents are ignored
-ll is one single entity and comes after l (ludar comes before llave)
-ch is one single entity and comes after c

The only thing I do not know if it could happen that a 'll' is not one  
entity but two (maybe the result of the combination of two nouns). If  
so then the entire story will be much more complicated.

Now the big question is how to delete all these accents in åàÿñü etc.  
to get aaynu. (technically spoken canonical decomposition of a Unicode  
string NFKD)
One possible way is to use a scripting language which can handle it.  
The only language I know  which can do it as default is python. For  
ruby, perl one has to install an additional library.

On a Mac system python is installed as default; on Windows not. If  
this ordering is also an issue for Windows users then one has to  
install it in beforehand.

The code comes here:

orderES - function(x) {
 #decomposes all accented characters
 str - NKFD(x)

 #all combining diacritics
 nonChars - c(768:879)
 pattern - paste([, intToUtf8(as.integer(nonChars)), ], sep=)

 #delete all combining diacritics
 str - gsub(pattern, , str)

 #transform ll an ch to l{ and c{ ({ comes after z)
 str - gsub(ll, l{, gsub(ch, c{, str))
 order(str)
}

NKFD - function(x) {
 system(paste(echo -en '# coding=utf-8\nimport unicodedata\nfor  
i,v in enumerate([\ ,  paste(x, collapse=\, \),  \]):print  
unicodedata.normalize(\NFKD\,unicode(v,  
\UTF-8\)).encode(\UTF-8\)'|python -,  sep=), intern=T)
}

Notes to NFKD rountine:
- only works if R's environment is set to UTF-8!
- for instance a Danish ø won't be decompose to o / (these cases has  
to be solved manually)
- this routine is not very fast


Cheers,

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] X 11

2008-04-16 Thread Hans-Joerg Bibiko

On 16 Apr 2008, at 12:21, Tommi Viitanen wrote:
 For example I have open x11 with device numbers 1 and 2. I want to  
 make
 plot to the device 1 without doing anythin to the 2 and not making a  
 new
 x11. Something like ?:

Do you mean something like dev.set(DEVICENUMBER) ?

Have a look at ?dev.set

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of words in a string

2008-04-10 Thread Hans-Joerg Bibiko

On 10 Apr 2008, at 07:43, Shubha Vishwanath Karanth wrote:
 So powerful, the gsub... But I really don’t understand the how the  
 regular expressions like  *\\S+$, need to be used and how to make  
 best use of it... Any article/material/links that I can go through?

A good starting point is: type

?regex

in your console.

Furthermore search in the net for regular expression.

http://en.wikipedia.org/wiki/Regular_expression
http://en.wikipedia.org/wiki/Regular_expression_examples
http://www.regular-expressions.info/

There are some variants of regexp engine on the market but the core  
syntax should be the same.

Regards,

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a legend without plot, and to use scientific notation for axes label ?

2008-04-10 Thread Hans-Joerg Bibiko

On 10 Apr 2008, at 12:33, Stanley Ng wrote:
 How can I use formatC to convert 600 to 6e5 and not 6e+05 ?

 formatC(60)
 [1] 6e+05
 formatC(60, format=e, digit=0)
 [1] 6e+05


Try this:

gsub(([eE])(\\+?)(\\-?)0+, \\1\\3, formatC(60, format=e,  
digit=0))

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] distance matrix as text file - how to import?

2008-04-09 Thread Hans-Joerg Bibiko
 On Tue, Apr 8, 2008 at 1:50 PM, Hans-Jörg Bibiko [EMAIL PROTECTED]  
 wrote:
 I was sent a text file containing a distance matrix à la:

 1
 2 3
 4 5 6

Thanks a lot for your hints.

At the end all hints ends up more or less in my stony way to do it.

Let me summarize it.

The clean way is to initialize a matrix containing my distance matrix  
and generate a dist object by using as.dist(mat).
Fine. But how to read the text data (triangular) into a matrix?

#1 approach - using 'read.table'

mat = read.table(test.txt, fill=T)

The problem here is that the first line doesn't contain the correct  
number of columns of my matrix, thus 'read.table' sets the number of  
columns to 5 as default.
Ergo I have to know the number of columns (num_cols) in beforehand in  
order to do this:

mat = read.table(test.txt, fill=T, col.names=rep('', num_cols))

Per definitionem the last line of test.txt contains the correct  
number of columns.
On a UNIX/Mac you can do the following:

num_cols - as.numeric(system(tail -n 1 'test.txt' | wc - 
w,intern=TRUE))

In other words, read the last line of 'test.txt' and count the number  
of words if the delimiter is a space. Or one could use 'readLines' and  
split the last array element to get num_cols.

#2 approach - using 'scan()'

mat = matrix(0, num_cols, num_cols)
mat[row(mat) = col(mat)] - scan(test.txt)

But this also leads to my problem:
1
2 4
3 5 6

instead of
1
2 3
4 5 6

 one solution 

The approach #2 has two advantages: it's faster than read.table AND I  
can calculate num_cols. The only problem is the correct order. But  
this is solvable via: reading the data into the upper triangle and  
transpose the matrix

mat - matrix(0, num_cols, num_cols)
mat[row(mat) = col(mat)] - scan(test.txt)
mat - t(mat)


Next. If I know that my text file really contains a distance matrix  
(i.e. the diagonals have been removed) then I can do the following:

data - scan(test.txt)
num_cols - (1 + sqrt(1 + 8*length(data)))/2 - 1
mat - matrix(0, num_cols, num_cols)
mat[row(mat) = col(mat)] - data
mat - t(mat)

#Finally to get a 'dist' object:

mat - rbind(0, mat)
mat - cbind(mat, 0)
dobj - as.dist(mat)


Again, thanks a lot!

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of words in a string

2008-04-09 Thread Hans-Joerg Bibiko

On 9 Apr 2008, at 17:29, Markus Gesmann wrote:
 Would this:

 sapply(strsplit(C,  ), length)

 work for?

or

length(unlist(strsplit(C,  )))

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of words in a string

2008-04-09 Thread Hans-Joerg Bibiko
Something like that?

gsub( {1,}\w+$, , C)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Number of words in a string

2008-04-09 Thread Hans-Joerg Bibiko

On 9 Apr 2008, at 17:44, Shubha Vishwanath Karanth wrote:
 Got all the answers using ?strsplit... Is there any way without  
 using string split?... More specifically... How can I just extract  
 the last word in all the strings without using ?strsplit ?

Oops, sorry.

gsub( *\w+$, , C)

should work.

--Hans

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.