[R] ggplot2 barplot: extra markers in graph

2010-08-04 Thread Dieter Vanderelst
Dear List,

(self-contained example + version info at the bottom)

I'm having trouble producing a barplot using the functions in ggplot2. When I 
use the position=dodge option, the bars are plotted  but also a number of 
spurious markers. More specifically, a number of black dots are plotted in the 
graph that should not be there. This behaviour is not seen when calling the 
same functions without the position=dodge.

Can someone shed some light on this? How can I avoid this?

#self-contained example:
library(ggplot2)
D-runif(30)
N-rep(c(1:10),3)
C-rep(c(1:3),10)
DT-data.frame(D=D,N=N,C=C)
#works ok
qplot(DT$N,DT$D,fill=factor(DT$C))+  geom_bar(stat = identity)
#in the resulting plot, a number of black markers are added that should not be 
there
qplot(DT$N,DT$D,fill=factor(DT$C))+  geom_bar(stat = identity, 
position=dodge)
#end of example

version info:
Windows xp 64
R version 2.11.1 (64 bit)
ggplot2 version 0.8.8

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot2 barplot: extra markers in graph

2010-08-04 Thread Dieter Vanderelst
Thanks, this indeed solved the problem.

Regards,
Dieter

On 4/08/2010 15:21, Shentu wrote:
 
 The reason you see the exra markers is that the first part of the command
 qplot(DT$N,DT$D,fill=factor(DT$C)) already plots the individual points.
 You didn't see it with geom_bar(stat = identity) simply because the
 stacked bars made the previous layer invisible. To see this you can use the
 ggplot function to reproduce your graph (with the points):
 
 p-ggplot(data=DT,aes(x=N,y=D))+geom_point()+geom_bar(stat=identity,aes(fill=factor(C)),position=dodge)
 
 print(p)
 
 It then becomes obvious that once you omit the geom_point(), the points are
 gone.
 
 This is IMO a feature of the ggplot2 system, not necessarily a bug.
 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comparing mixture models (mix function in mixdist package)

2009-04-22 Thread Dieter Vanderelst

Dear List,

We are using the mix() function in the mixdist package to fit mixture models to 
some of our data.

The package provides a function to compare the fits of nested models using an 
ANOVA function.

However, we were wondering whether there are methods that can be used to 
compare models that differ in the number of distributions fitted and/or the 
shape of the distributions.

Is there a way, using likelihoods maybe, to compare this kinds of fits in a 
statistical meaningfull way?

Regards,
Dieter Vanderelst

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kruskal's MDS results

2009-04-17 Thread Dieter Vanderelst

A few people suggested taking a look at Ripley's book MASS. I know the formula 
listed there.

The point is that the manual for the isoMDS function says it's stress output is in 
percent. Does this mean, the stress reported by isoMDS is just the stress 
value in MASS (which ranges from 0 to 1) value multiplied by 100? I've haven't been able 
to find any resource that expresses stress in values from 0 to 100. So, this would be a 
convention introduced by the authors of the package?

In general, I think the R manuals could do with a bit more explaining on the 
output of the functions. I understand that some knowledge of statistics is 
assumed when working with R but sometimes the documentation on the returned 
values is really sparse. Even when familiar with the domain, there are several 
different conventions followed by different authors. This should be clear when 
reading the manual.

I know a lot of hard work gets into writing software, but it seems sometimes 
people are less keen on documenting their hard work properly.

stephen sefick wrote:

You can look in MASS 4 for this formula on page 308 .  Go to the
source and ask the horse he'll give you an answer that you endorse.

On Thu, Apr 16, 2009 at 8:13 AM, Bob Green bgr...@dyson.brisnet.org.au wrote:

Dieter,

You could always try the Classification, clustering, and phylogeny
estimation  list which often includes posts regarding MDS:
http://lists.sunysb.edu/index.cgi?A0=CLASS-L

regards

Bob

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.







__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kruskal's MDS results

2009-04-17 Thread Dieter Vanderelst

Thank you for clearing this out.

Jari Oksanen wrote:

Dieter Vanderelst Dieter.Vanderelst at ua.ac.be writes:



The point is that the manual for the isoMDS function says it's stress output

is in percent. Does this mean,

the stress reported by isoMDS is just the stress value in MASS (which ranges

from 0 to 1) value multiplied by

100? I've haven't been able to find any resource that expresses stress in

values from 0 to 100. So, this

would be a convention introduced by the authors of the package?


A comment about novelty of using percentages. I also had a look at some NMDS
resources, and the first I found were two Kruskal's papers that happened to be
on my desk (Psychometrika 29, 1-27 and Psychometrika 29, 115-129, both from
1964). Both of these expressed stress in percents. Certainly this is not a
convention introduced by the authors of the package, since they are much too
young to have done that prior to 1964.

Cheers, Jari Oksanen

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Kruskal's MDS results

2009-04-15 Thread Dieter Vanderelst

Dear List,

I'm trying to interpret the results of the Kruskal's Non-metric 
Multidimensional Scaling algorithm (isoMDS, MASS package).

The 'goodness of fit' is reported as The final stress achieved (in percent).

What does this mean exactly? I've tried to google for an answer but I've not 
come up with a definitive answer.

Regards,
Dieter


--
Dieter Vanderelst
PhD Student

Active Perception Lab
University of Antwerp
http://batbits.webnode.com/

Postal Address:
Prinsstraat 13
B-2000 Antwerp
Belgium

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Kruskal's MDS results

2009-04-15 Thread Dieter Vanderelst

Hi Michael,

Thanks for the reply.

I understand that the stress is a measure of how good the algorithm managed to represent the ordinal distances between items. And I also see why it's dependent on the number of dimensions. 


I was hoping someone could tell me exactly what the formula for the percentual 
stress is. To me it's not clear how this metric is calculated.

Regards,
Dieter


Michael Denslow wrote:

Hi Dieter,

I'll take a shot at this. As I understand it, the stress is telling
you how the ordination distances compare with original
dissimilarities that you calculated.

It is a measure how well your ordination has done in representing the
relationship of your sites. Note that the stress will differ
depending on how many dimensions are used. I believe the default is k
= 2 in isoMDS.

Hope this helps, Michael



Dear List,

I'm trying to interpret the results of the Kruskal's Non-metric
Multidimensional Scaling algorithm (isoMDS, MASS package).

The 'goodness of fit' is reported as The final stress achieved (in
percent).

What does this mean exactly? I've tried to google for an answer but
I've not come up with a definitive answer.

Regards, Dieter


-- Dieter Vanderelst PhD Student

Active Perception Lab University of Antwerp 
http://batbits.webnode.com/


Postal Address: Prinsstraat 13 B-2000 Antwerp Belgium


Michael Denslow

Graduate Student I.W. Carpenter Jr. Herbarium [BOON] Department of
Biology Appalachian State University Boone, North Carolina U.S.A.

-- AND --

Communications Manager Southeast Regional Network of Expertise and
Collections sernec.org





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interp() function output not continue

2008-06-17 Thread Dieter Vanderelst

Dear List,

I'm using interp() to prepare 3d data for plotting with the contour() function.

If have x,y and z data. All are arrays. X and Y are sampled in an orderly 
fashion on a grid (a circular sub-area of a grid - see plot). I'm trying to use 
interp() to get x and y arrays and a z matrix that can be fed to contour().

This is the command: interp(x,y,z,extrap=F,linear=FALSE,duplicate='mean')

In the result there are, consistently, some discontinuities. This happens 
always in the 'middle' of the data.

I've uploaded a plot that might clarify the problem: 
http://examples.attic.sent.com/example.png

As you can see the middle of the plot is discontinue. When I look at the data, there is no particular reason why this should happen. 


The problem seems to be a single row in the z matrix returned by interp() right 
in the middle of the matrix (line 30 of 60). Replacing this line with the mean 
of row 29 and 31 seems to solve the problem. This results in this plot: 
http://examples.attic.sent.com/example_fix.png. This works, but it is not nice 
of course.

Is this something that looks familiar to someone? Can I replace the interp() 
function with something else? Could this be due to the particular way my data 
is sampled?

Regards,
Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] spherical plots?

2008-06-05 Thread Dieter Vanderelst

Dear List,

I'm wondering whether it is possible to use R to make spherical plots.

I have 3d data: azimuth, elevation and a certain variable Y. 


I want to plot Y in terms of azimuth and elevation such that it seems to be a 
contour plot overlaid on a sphere (but projected on a plane, of course).

If my explanation is not clear, you can find an example of what I'm after here: 
http://www.mediafire.com/?in0fmnikzmg (created using matlab).

Regards,
Dieter Vanderelst

--
Dieter Vanderelst
PhD Student

Active Perception Lab
University of Antwerp

Koningstraat 8
B-2000 Antwerp
Belgium

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] lineplot.CI problem

2008-02-18 Thread Dieter Vanderelst
Hi,

Thank you very much for this very rapid and helpful reply.

I'm giving the code a spin around the block. It looks to be working fine.

There is actually also something else on my lineplot.CI wish list. If I might 
be so bold to ask:

When plotting multiple data traces with relative high standard deviations, the 
whiskers tend to overlap. This hampers the interpretation of the data somewhat. 
Is there a way to let the function plot data points with some horizontal 
displacement to prevent this?

Regards,

Dieter

Manuel Morales wrote:
 Here's an updated version of lineplot.CI that will succeed even for
 cases where data are not present in all factor combinations. Also, this
 version has the option x.cont to specify that the x axis represents a
 continuous variable with proportional spacing. A new version of sciplot
 with these changes will be posted soon.
 
 ## Examples:
 source(lineplot.CI.R)
 
 ## Generate data
 time=c(rep(c(21:30),3),rep(c(1:10),3))
 y - time+rnorm(60,0,1)
 factors - rep(c(1:2),each=30)
 
 ## Proportional spacing
 lineplot.CI(resp=y, x.factor=time, group=factors, x.cont=TRUE)
 
 ## Factorial spacing
 lineplot.CI(resp=y, x.factor=time, group=factors)
 
 Manuel
 
 On Fri, 2008-02-15 at 15:18 +0100, Dieter Vanderelst wrote:
 Hi List,

 I have a problem plotting data using the lineplot.CI command in the sciplot 
 package.

 I want to plot the data of 2 experimental cases using different lines 
 (traces). Time is on the X-axis. The tricky thing is that the data 
 collection in the second case started later than for the first case. This is 
 to say: the first n data points for the second case are missing.

 So far so good. However, when I plot the data using lineplot.CI, the 
 standard error bars are not aligned correctly with the markers.

 I know that this might be difficult to imagine. Here you can find an 
 example: http://i254.photobucket.com/albums/hh115/MarkerMe/example.png

 So, has anybody experienced this problem and solved it before? I think I 
 could try padding the data of the second case with zeros to eliminate the 
 missing data. But I hope there is a better solution.

 Regards,
 Dieter
 
 Dieter Vanderelst
 dieter dot vanderelst at emailengine dot org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] lineplot.CI problem

2008-02-15 Thread Dieter Vanderelst
Hi List,

I have a problem plotting data using the lineplot.CI command in the sciplot 
package.

I want to plot the data of 2 experimental cases using different lines (traces). 
Time is on the X-axis. The tricky thing is that the data collection in the 
second case started later than for the first case. This is to say: the first n 
data points for the second case are missing.

So far so good. However, when I plot the data using lineplot.CI, the standard 
error bars are not aligned correctly with the markers.

I know that this might be difficult to imagine. Here you can find an example: 
http://i254.photobucket.com/albums/hh115/MarkerMe/example.png

So, has anybody experienced this problem and solved it before? I think I could 
try padding the data of the second case with zeros to eliminate the missing 
data. But I hope there is a better solution.

Regards,
Dieter

Dieter Vanderelst
dieter dot vanderelst at emailengine dot org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rearrange data: one line per subject, one column per condition

2007-11-27 Thread Dieter Vanderelst
Dear R-list,

Is there a way to convert the typical long R data-format to a 1-line per 
subject format?

I have data formatted as:

Group   subjcondition   variable
1   1   1   746.36625
2   2   1   1076.152857
1   3   1   1076.152857
2   4   1   657.4263636
1   5   1   854.127
2   6   1   1191.676154
1   7   1   1028.175385
1   1   2   46.36625
2   2   2   76.152857
1   3   2   76.152857
2   4   2   57.4263636
1   5   2   54.127
2   6   2   191.676154
1   7   2   028.175385
...

Here, one line equals the value of one subjects  VARIABLE in function of the 
GROUP and the CONDITION.

However, I would like to rearrange the data so that the columns of my data 
equal the 2 conditions and the lines the subjects. This is something like:

subjgroup   condition1  condition2
1   1   746.36625   46.36625
2   2   1076.152857 76.152857
...

I know its possible the other way around. But that's not what I need (this 
time).

Before anyone asks: Yes, I want to do some analysis on my data in SPSS, so I 
need the rearranged format.

Regards and Thanks,
Dieter 
--
Dieter Vanderelst

dieter _ vanderelst AT emailengine DOT org
d DOT vanderelst AT tue DOT nl

Eindhoven University of Technology
Faculty of Industrial Design
Designed Intelligence Group
Den Dolech 2
5612 AZ Eindhoven
The Netherlands
Tel +31 40 247 91 11

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ylim in barplot2 function?

2007-10-31 Thread Dieter Vanderelst
Hi list,

I'm using barplot2 form the gplots package to plot a few numbers (I want to add 
SD bars later).

However, I would like the y-axis not to start from 0 but 500. When I add the 
parameters YLIM, something goes wrong. The graph is not 'cut off' at 500. 
Instead the bars seems to sink trough the bottom of the graph.

Because its a little hard to explain, here is a self-containing example:


library(gplots)

ABrt-c(588,589,593,588)
Wrt-c(580,583,592,612)

RT-rbind(ABrt,Wrt)
barplot2(RT,beside=T,col=c('black','white'),ylim=c(500,1000))

Does anybody know of a solution?

Regards,
Dieter

--
Dieter Vanderelst

dieter _ vanderelst AT emailengine DOT org
d DOT vanderelst AT tue DOT nl

Eindhoven University of Technology
Faculty of Industrial Design
Designed Intelligence Group
Den Dolech 2
5612 AZ Eindhoven
The Netherlands
Tel +31 40 247 91 11

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cluster Analysis

2007-10-29 Thread Dieter Vanderelst
take a look at hclust()

Dieter

Katia Freire wrote:
 Dear all,

   I would like to know if I can do a hierarchical cluster analysis in R using 
 my own similarity matrix and how. Thanks. Katia Freire.
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] adding aggregate data to data frames

2007-10-19 Thread Dieter Vanderelst
Dear List,

I have a data frame containing reaction times of participants in some 
experiment.

As usual each line is single trial in the experiment. Two factors denote the 
conditions in the experiment. Each participant completes different trials for 
each condition.

Now, the question:

I want to calculate per participant, per condition the mean reaction time and 
its standard deviation.

I can do this using AGGREGATE(). However, I want to merge this info with the 
original data frame. This is, I want each line to contain the mean and SD of 
the reaction time for the participant and condition on that line.

I have tried to solve this by looping trough data frame. For each line, I 
select using SUBSET() the lines that belong to the same participant and 
condition. Then I calculate the average/SD. But this takes a long time.

BYTW: I find that finding proper subject for r-help list mails, is very hard. 
So, if any one knows a set of better keywords...

Any ideas?

Thanks,
Dieter Vanderelst

--
Dieter Vanderelst

dieter _ vanderelst AT emailengine DOT org
d DOT vanderelst AT tue DOT nl

Eindhoven University of Technology
Faculty of Industrial Design
Designed Intelligence Group
Den Dolech 2
5612 AZ Eindhoven
The Netherlands
Tel +31 40 247 91 11

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cumulative frequency plots for factors

2007-10-15 Thread Dieter Vanderelst
Dear list,

I have a data frame with a number of events (factor) and the times at which 
they occurred (continuous variable):

event time
A 10
A 12
B 15
A 17
C 13
...

Is it possible in R to make a plot against time of the cumulative frequency of 
occurrence of each event? This would be, a raising line for each factor.

Regards,
Dieter

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Making a table: collapsing across sub-strings

2007-10-04 Thread Dieter Vanderelst
Hi,

A sub string can occur anywhere in the main string.

I think I could use TABLE and than add the numbers. But I don't know how 
to access the numbers in the result of table.

Another problem is that there might be a hierarchy in the strings. This 
is, string a might be a subset of b while b might be a subset of c. So, 
when checking the strings, I would have to start with the longest string 
and find all subsets of that one. An than I should check the second 
longest string and so on...

But I cannot find a way of ordering strings on their length.

Regards,
Dieter

jim holtman wrote:
 How do you determine if one string is a subset of another?  Does it
 only match at the beginning, or anywhere?  How large is your set of
 strings?  Can you use table as you describe and then determine what
 the groupings of subsets are and then just add the numbers together?
 You can use grep/regexpr to determine if one string is a subset of
 another.
 
 On 10/3/07, Dieter Vanderelst [EMAIL PROTECTED] wrote:
 Hi list,

 I'm currently processing textual data and I would really appreciate some
 help with one off my problems.

 I have a set of strings and I want to count how often each of this
 strings appears in this set.

 This is not very difficult and can be done as:

 TB-table(my_set)
 plot(TB)

 However, I also want to collapse across sub-strings. This is, I want a
 sub-string ss of string S to be counted as an occurrence of string S.

 So, 'abab' should be included in the count of 'ababaaa' and should not
 be listed as a separate entry in the frequency table.

 Does somebody has a pointer to a way to do this? I have been checking
 out the CRAN packages for handling DNA sequences, but this has not
 really brought me closer to a solution.

 Thanks,
 Dieter Vanderelst

 --
 Dieter Vanderelst
 Eindhoven University of Technology
 Faculty of Industrial Design
 Designed Intelligence Group
 Den Dolech 2
 5612 AZ Eindhoven
 The Netherlands
 Tel +31 40 247 91 11

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Making a table: collapsing across sub-strings

2007-10-03 Thread Dieter Vanderelst
Hi list,

I'm currently processing textual data and I would really appreciate some
help with one off my problems.

I have a set of strings and I want to count how often each of this
strings appears in this set.

This is not very difficult and can be done as:

TB-table(my_set)
plot(TB)

However, I also want to collapse across sub-strings. This is, I want a
sub-string ss of string S to be counted as an occurrence of string S.

So, 'abab' should be included in the count of 'ababaaa' and should not
be listed as a separate entry in the frequency table.

Does somebody has a pointer to a way to do this? I have been checking
out the CRAN packages for handling DNA sequences, but this has not
really brought me closer to a solution.

Thanks,
Dieter Vanderelst

--
Dieter Vanderelst
Eindhoven University of Technology
Faculty of Industrial Design
Designed Intelligence Group
Den Dolech 2
5612 AZ Eindhoven
The Netherlands
Tel +31 40 247 91 11

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.