[R] compressing/reducing data for plot

2011-10-17 Thread Timo Schneider
Hello,

I have simulation results in the form of

  Time  V   I
  0.e+000  7.218354344368e-001  5.224478627497e-006  
  1.e-009  7.218354344368e-001  5.224477718002e-006  
  2.e-009  7.218354344368e-001  5.224477718002e-006  
  4.000108361244e-009  7.218354344368e-001  5.224478627497e-006  
  8.000325083733e-009  7.218354344368e-001  5.224478627497e-006

as the timesteps are small, each simulation results in a lot of data,
about 1e5 data points per simulation.

Now I want to plot this data. If I do this with a simple

plot(x=data$Time, y=data$V, type="l")

the resulting file (I plot into postscript files) is huge and takes a
long time to render, since R creates a new line segment for each
timestep. Of course it makes no sense to plot more than a few hundred
datapoints in a single plot. However, I don't have a good idea how to
remove the "uninteresting" part of the data, i.e., the datapoints that
lie very close to the lines that would be drawn by R anyway if there
were no datapoint for that time value.

Since the values in my simulation are constant most of the time but
sometimes have interesting "spikes" a simple 

data <- data[seq(1:length(data),1000),]

to only plot every 1000th point does not work for me as it could remove
some "spikes" completely or lead to aliasing problems.

Is there any standard way to do this in R?


The best thing I came up with so far is a function that judges if a row
in the dataframe should be kept for plotting based on each points
difference to its predecessor. However, this function has two problems:

* It is very slow! (Takes about 4 seconds for each 1e5 element
dataframe)

* It does not work well if the values increase/decrease monotonically
with small values - it will remove them all since the difference between
each point and its predecessor is minimal

I included my own function below:

=== cut ===

get_significant_rows_1 <- function (data, threshold) {

# get the difference between each datapoint and the following datapoint
# of course this list is one shorter than the input dataset, which does
# not matter since the first and last datapoint will always be included
diffs = abs(data[1:nrow(data)-1,] - data[2:nrow(data),]);

# normalize the differences according to the value range in their column
col.range = apply(data,2, function(d) {abs(max(d) - min(d))});
normalized_diffs <- t(apply(diffs, 1, function(d) {d/col.range}));
rm("col.range");
# get the "biggest difference" in each row
biggest_difference <- as.vector(apply(normalized_diffs,1, max));

# check if the "biggest difference" is above the threshold - 
# that means the row is "significant" in a plot
signif <- biggest_difference >= threshold;
rm("biggest_difference");
# the last datapoint/row is always significant, otherwise the plot could become 
"shorter"
signif[length(signif)] = TRUE;

# also the first one - we are adding a TRUE in front of the signif vector
# now, since it does not include a value for this because the first value
# naturally doesn't have a predecessor, so there was no entry for it in 
# the diffs array
signif <- append(signif, TRUE, 0);

# if a point is significant in a plot, the point before that is also 
"important",
# at least for line plots, otherwise we get angled lines where flat ones should 
be
signif <- (signif | append(signif[2:length(signif)], FALSE));

return(data[signif,]);

}

#example application (makes no sense for this kind of data though)

data <- data.frame(a=rnorm(1), b=rnorm(1));
#dataset, threshold
get_significant_rows_1(data, 0.01)

 here 

Thank you for any helpful advice or comments. :-)

Regards,
Timo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Plotting lines with equidistant points for identification

2011-04-19 Thread Timo Schneider
Dear R Gurus,

I would like to make a line-plot of a large data-set (>200 data-points)
for a document which will be printed in black and white. The plot will
contain 5 different lines. So i need a way to differentiate between the
lines. Since color is not an option i tried different line styles with
all sorts of "line-gap" or "dot dot dot" patterns. Since my plots are
rather small it is hard to get lines that look contiguous enough while
still being identifiable through their line style.

What I would like to have are lines that have symbols on top of them,
similar to what i get with plot(x,y,type='o'). The problem is that i
have to many data-points for that method, the symbols will just "melt
together", as in plot(seq(1,500), seq(1,500), type='o') - just one big
line.

What i want is to have a configurable number of symbols spread
equidistantly over the whole graph, regardless of the number or position
of data-points.

Does anyone have an idea how to achieve this?

Thanks,
Timo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Grouping data in dataframe

2009-07-15 Thread Timo Schneider
Am Mittwoch, den 15.07.2009, 00:42 -0500 schrieb markle...@verizon.net:

Hi!

> Hi: I think aggregate does what you want. you had 34 in one of your
> columns but I think you meant it to be 33.
> 
> DF <- read.table(textConnection("ExpA ExpB ExpC Size
> 1 12 23 33 1
> 2 12 24 29 1
> 3 10 22 34 1
> 4 25 50 60 2
> 5 24 53 62 2
> 6 21 49 61 2"),header=TRUE)
> 
> print(DF)
> print(str(DF))
> 
> aggregate(DF,list(DF$Size),median)

Yes, thanks to you and all the other people who helped! The aggregate
function is exactly what I was looking for. Thanks for the help!

Regards,
Timo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Grouping data in dataframe

2009-07-14 Thread Timo Schneider
Hello,

I have a dataframe (obtained from read.table()) which looks like

 ExpA   ExpB   ExpC   Size
1  12 2333  1
2  12 2429  1
3  10 2234  1
4  25 5060  2
5  24 5362  2
6  21 4961  2

now I want to take all rows that have the same value in the "Size"
column and apply a function to the columns of these rows (for example
median()). The result should be a new dataframe with the medians of the
groups, like this:

 ExpA   ExpB   ExpC   Size
1  12 2334  1
2  24 5061  2

I tried to play with the functions by() and tapply() but I didn't get
the results I wanted so far, so any help on this would be great!

The reason why I am having this problem: (I explain this just to make
sure I don't do something against the nature of R.)

I am doing 3 simillar experiments, A,B,C and I change a parameter in the
experiment (size). Every experiment is done multiple times and I need
the median or average over all experiments that are the same. Should I
preprocess my data files so that they are completely different? Or is it
fine the way it is and I just overlooked the simple solution to the
problem described above?

Regards,
Timo

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.