>1) I suggest you try a postscript() device, and convert later if you need >to. Expect a very large file size.
Dear Dr. Ripley, Thank you! Postscript was able to finish the job (bitmap killed itself.) The filesizes are indeed large: 1.4G and requiring over two hours to display by gv, but ultimately viewable. I'm new to manipulating ps files but hopefully I can find a fast way to convert the files into a small format. I found an archived message of yours that suggested not to use pch="." as a symbol for graphing large datasets, and upon experimentation I found that the default symbol, pch=21, seemed to produce the smallest files for some sets of test data when compared with some other symbols. Running "pch=21, cex=0.35" produced a fairly small point but consumed much less space than pch="." Is this the best solution for producing plot symbols that take up little room both on the plot and the hard drive? >Sounds like the problem is in your X server and not in R. I've seen this >with Xfree (and don't use that myself on Linux). It's possible... however, I wouldn't know how to fix it from that end, either... >2) Don't plot all the points. You say you have a `very large dataset'. In >statistics, we give numbers, not vague descriptions. However, with what >that means to me (many millions of rows) a scatterplot of a very large >dataset is going to be mainly black at least in places. (We've >experienced that with 1.4 million points, for example.) That's not a good >way to display the data. Either use a density plot, or if you are >interested in outliers, thin the centre. We did this by estimating a >density phat, then randomly selecting points with probability min(1, >const/phat(x)) for a suitable `const' I have a set of textfiles, each containing a 450,000 x 41 matrix (1.845 million datapoints) and roughly 300M. Indeed, the scatterplots are overprinted, but I am interested in getting a "feel" for the data before charging ahead. The data (measurements on artificial phylogenetic trees) were produced by simulation and although I have been running checks all along I wanted to make sure that my simulations weren't producing any strange outliers or oddly shaped distributions. On the other hand, I had no real guess as to what the data would look like or even what variables would show strong correlations. Since many of these datapoints are from repeats, I was in fact able to discern a lot of pattern, rather than getting all-black plots. Using both a density plot and a thinned plot may be the way to go, if I don't find a way to shrink down the graphs. I hoped that "pairs" would be a fast, one-line way to take in all my data at once, but of course nothing has been that easy with all this data. Jean ______________________________________________ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help
