Re: [R] Removing polygons from shapefile of Scotland and Islands
I believe mapshaper has functionality for removing small 'islands'. There is a webinterface for mapshaper, but I see there is also an R-package (see https://search.r-project.org/CRAN/refmans/rmapshaper/html/ms_filter_islands.html for island removal). If you want to manually select which islands to keep and which to remove, you can split multipolygons into single polygons. I believe that is possible using st_cast. But if it is just getting the relevant portion of the map on screen. With the plot-command and using st_viewport it is possible to set the part of the map that is drawn. HTH, Jsn On 14-05-2024 15:16, Nick Wray wrote: Hello I have a shapefile of Scotland, including the islands. The river flow data I am using is only for the mainland and for a clearer and larger map I would like to not plot Orkney and Shetland to the north of the mainland, as I don't need them. The map I have I got from https://borders.ukdataservice.ac.uk/easy_download_data.html?data=infuse_ctry_2011 then I put the uk shapefile onto my laptop with no problems (I have sf running) the_uk<-st_read(dsn="C:/Users/nickm/Desktop/Shapefiles/infuse_ctry_2011.shp") scotland<-the_uk[2,] plot(scotland$geometry) This gives me a nice map of Scotland plus islands but obviously there are lots of separate polygons and if I go into the points with something like scot_pts<-unlist(as.data.frame(scotland$geometry)) it's not at all clear how I can get rid of the points I don't want as they don't seem to be listed in any easy way to find where one polygon stops and another starts I am wondering whether this approach is right anyway or whether there is some sf function which would allow me to identify the polygons I want - essentially the big one which is the mainland without lots of elaborate conversions and manipulations Any pointers, thoughts etc much appreciated Thanks Nick Wray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Inquiry about bandwidth rescaling in Ksmooth
Dear Bert, Thank you very much for your quick reply. I have tested this, and it indeed appears to be the source of the discrepancy I observed. My apologies for overlooking this in the documentation and thank you for clarifying. Cheers, Jan From: Bert Gunter Sent: Thursday, 26 October 2023 20:19 To: Jan Failenschmid Cc: r-help@r-project.org Subject: Re: [R] Inquiry about bandwidth rescaling in Ksmooth Apologies in advance if my comments don't help, in which case, no need to respond, but I noted in ?ksmooth: "bandwidth the bandwidth. The kernels are scaled so that their quartiles (viewed as probability densities) are at ± 0.25*bandwidth." So, could this be a source of the discrepancies you cited? Given that ?ksmooth explicitly says: "Note: This function was implemented for compatibility with S, although it is nowhere near as slow as the S function. Better kernel smoothers are available in other packages such as KernSmooth." One wonder why you bother with it at all? (That was rhetorical -- do not answer). Cheers, Bert On Thu, Oct 26, 2023 at 11:06 AM Jan Failenschmid via R-help wrote: > > Dear Sir, Madam, or to whom this may concern, > > my name is Jan Failenschmid and I am a Ph.D. student at Tilburg University. > For my project I have been looking into different types of kernel regression > estimators and corresponding R functions. > While comparing different functions I noticed that stats::ksmooth returned > different estimates for the same bandwidth > as other kernel regression estimators that should be equivalent (i.e. the > local polynomial estimators KernSmooth::locpoly and > locpol::locpol with degree 0). However, when optimizing the bandwidth of > ksmooth separately using the same loss function, I find comparable estimates > to the other two estimators for a (larger) different bandwidth. To confirm > this, I wrote my own Nadaraya-Watson kernel regression estimator, which is > consistent with the two local polynomial estimators and shows the same > discordance with ksmooth. > > This led me to the suspicion that the bandwidth that is passed to kmooth is > rescaled or transformed within the function. Unfortunately, I was not able to > confirm this with either the code of the function or the documentation. It > would be of great help to me if you could clarify this for me. > > Thank you very much for your time and help in advance. > > Kind regards, > > Jan Failenschmid > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Inquiry about bandwidth rescaling in Ksmooth
Dear Sir, Madam, or to whom this may concern, my name is Jan Failenschmid and I am a Ph.D. student at Tilburg University. For my project I have been looking into different types of kernel regression estimators and corresponding R functions. While comparing different functions I noticed that stats::ksmooth returned different estimates for the same bandwidth as other kernel regression estimators that should be equivalent (i.e. the local polynomial estimators KernSmooth::locpoly and locpol::locpol with degree 0). However, when optimizing the bandwidth of ksmooth separately using the same loss function, I find comparable estimates to the other two estimators for a (larger) different bandwidth. To confirm this, I wrote my own Nadaraya-Watson kernel regression estimator, which is consistent with the two local polynomial estimators and shows the same discordance with ksmooth. This led me to the suspicion that the bandwidth that is passed to kmooth is rescaled or transformed within the function. Unfortunately, I was not able to confirm this with either the code of the function or the documentation. It would be of great help to me if you could clarify this for me. Thank you very much for your time and help in advance. Kind regards, Jan Failenschmid [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?
Another thing that I considered, but doesn't seem to be supported, is rotating the symbols. I noticed that that does work with text. So you could use a arrow symbol and then specify the angle aesthetic. But this still relies on text and unfortunately there are no arrowlike symbols in ASCII: except perhaps 'V'. I can't say how the support for non-ascii text is over different OS-es and localities. https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues gives some 'hints' On 06-10-2023 14:21, Chris Evans via R-help wrote: Thanks again Jan. That is lovely and clean and I probably should have seen that option. I had anxieties about the portability of using text. (The function will end up in my https://github.com/cpsyctc/CECPfuns package so I'd like it to be fairly immune to character sets and different platforms in different countries. I'm morphing this question a lot now but I guess it's still on topic really. I know I need to put in some time to understand the complexities of R and platforms (I'm pretty exclusively on Linux, Ubuntu or Debian now so have mostly done the ostrich thing about these issues though I do hit problems exchanging things with my Spanish speaking colleagues). Jan or anyone: any simple reassurance or pointers to resources I should best use for homework about these issues? TIA (again!) Chris On 06/10/2023 12:55, Jan van der Laan wrote: You are right, sorry. Another possible solution then: use geom_text instead of geom_point and use a triangle shape as text: ggplot(data = tmpTibPoints, aes(x = x, y = y)) + geom_polygon(data = tmpTibAreas, aes(x = x, y = y, fill = a)) + geom_text(data = tmpTibPoints, aes(x = x, y = y, label = "▼", color = c), size = 6) + guides(color = FALSE) [much snipped] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?
You are right, sorry. Another possible solution then: use geom_text instead of geom_point and use a triangle shape as text: ggplot(data = tmpTibPoints, aes(x = x, y = y)) + geom_polygon(data = tmpTibAreas, aes(x = x, y = y, fill = a)) + geom_text(data = tmpTibPoints, aes(x = x, y = y, label = "▼", color = c), size = 6) + guides(color = FALSE) On 06-10-2023 12:11, Chris Evans via R-help wrote: Sadly, no. Still shows the same legend with both sets of fill mappings. I have found a workaround, sadly much longer than yours (!) that does get me what I want but it is a real bodge. Still interested to see if there is a way to create a downward pointing solid symbol but here is my bodge using new_scale_fill() and new_scale_color() from the ggnewscale package (many thanks to Elio Campitelli for that). library(tidyverse) library(ggnewscale) # allows me to change the scales used tibble(x = 2:9, y = 2:9, ### I have used A:C to ensure the changes sort in the correct order to avoid the messes of using shape to scale an ordinal variable ### have to say that seems a case where it is perfectly sensible to map shapes to an ordinal variable, scale_shape_manual() makes ### this difficult hence this bodge c = c(rep("A", 5), "B", rep("C", 2)), change = c(rep("Deteriorated", 5), "No change", rep("Improved", 2))) %>% ### this is just keeping the original coding but not used below mutate(change = ordered(change, levels = c("Deteriorated", "No change", "Improved"))) -> tmpTibPoints ### create the area mapping tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> tmpTibArea1 tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> tmpTibArea2 tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> tmpTibArea3 tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> tmpTibArea4 bind_rows(tmpTibArea1, tmpTibArea2, tmpTibArea3, tmpTibArea4) -> tmpTibAreas ### now plot ggplot(data = tmpTib, aes(x = x, y = y)) + geom_polygon(data = tmpTibAreas, aes(x = x, y = y, fill = a), alpha = .5) + scale_fill_manual(name = "Areas", values = c("orange", "purple", "yellow", "brown"), labels = letters[1:4]) + ### next two lines use ggnewscale functions to reset the scale mappings new_scale_fill() + new_scale_colour() + ### can now use the open triangles and fill aesthetic to map them geom_point(data = tmpTibPoints, aes(x = x, y = y, shape = c, fill = c, colour = c), size = 6) + ### use the ordered variable c to get mapping in desired order ### which, sadly, isn't the alphabetical order! scale_shape_manual(name = "Change", values = c("A" = 24, "B" = 23, "C" = 25), labels = c("Deteriorated", "No change", "Improved")) + scale_colour_manual(name = "Change", values = c("A" = "red", "B" = "grey", "C" = "green"), labels = c("Deteriorated", "No change", "Improved")) + scale_fill_manual(name = "Change", values = c("A" = "red", "B" = "grey", "C" = "green"), labels = c("Deteriorated", "No change", "Improved")) That gives the attached plot which is really what I want. Long bodge though!* * On 06/10/2023 11:50, Jan van der Laan wrote: Does adding , show.legend = c("color"=TRUE, "fill"=FALSE) to the geom_point do what you want? Best, Jan On 06-10-2023 11:09, Chris Evans via R-help wrote: library(tidyverse) tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> tmpTibPoints tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> tmpTibArea1 tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> tmpTibArea2 tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> tmpTibArea3 tibble(x = c(5, 10, 10, 5), y = c(
Re: [R] Is it possible to get a downward pointing solid triangle plotting symbol in R?
Does adding , show.legend = c("color"=TRUE, "fill"=FALSE) to the geom_point do what you want? Best, Jan On 06-10-2023 11:09, Chris Evans via R-help wrote: library(tidyverse) tibble(x = 2:9, y = 2:9, c = c(rep("A", 5), rep("B", 3))) -> tmpTibPoints tibble(x = c(1, 5, 5, 1), y = c(1, 1, 5, 5), a = rep("a", 4)) -> tmpTibArea1 tibble(x = c(5, 10, 10, 5), y = c(1, 1, 5, 5), a = rep("b", 4)) -> tmpTibArea2 tibble(x = c(1, 5, 5, 1), y = c(5, 5, 10, 10), a = rep("c", 4)) -> tmpTibArea3 tibble(x = c(5, 10, 10, 5), y = c(5, 5, 10, 10), a = rep("d", 4)) -> tmpTibArea4 bind_rows(tmpTibArea1, tmpTibArea2, tmpTibArea3, tmpTibArea4) -> tmpTibAreas ggplot(data = tmpTib, aes(x = x, y = y)) + geom_polygon(data = tmpTibAreas, aes(x = x, y = y, fill = a)) + geom_point(data = tmpTibPoints, aes(x = x, y = y, fill = c), pch = 24, size = 6) __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] overlay shaded area in base r plot
Shorter/simpler alternative for adding a alpha channel adjustcolor("lightblue", alpha = 0.5) So I would use something like: # Open new plot; make sure limits are ok; but don't plot plot(0, 0, xlim=c(1,20), ylim = range(c(mean1+sd1, mean2+sd2, mean1-sd1, mean2-sd2)), type="n", las=1, xlab="Data", ylab=expression(bold("Val")), cex.axis=1.2,font=2, cex.lab=1.2) polygon(c(1:20,20:1), c(mean1[1:20]+sd1[1:20],mean1[20:1]-sd1[20:1]), col=adjustcolor("blue", 0.5), border = NA) polygon(c(1:20,20:1), c(mean2[1:20]+sd2[1:20],mean2[20:1]-sd2[20:1]), col=adjustcolor("yellow", 0.5), border = NA) lines(1:20, mean1,lty=1,lwd=2,col="blue") lines(1:20, mean2,lty=1,lwd=2,col="yellow") On 19-09-2023 09:16, Ivan Krylov wrote: В Tue, 19 Sep 2023 13:21:08 +0900 ani jaya пишет: polygon(c(1:20,20:1),c(mean1[1:20]+sd1[1:20],mean1[20:1]),col="lightblue") polygon(c(1:20,20:1),c(mean1[1:20]-sd1[1:20],mean1[20:1]),col="lightblue") polygon(c(1:20,20:1),c(mean2[1:20]+sd2[1:20],mean2[20:1]),col="lightyellow") polygon(c(1:20,20:1),c(mean2[1:20]-sd2[1:20],mean2[20:1]),col="lightyellow") If you want the areas to overlap, try using a transparent colour. For example, "lightblue" is rgb(t(col2rgb("lightblue")), max = 255) → "#ADD8E6", so try setting the alpha (opacity) channel to something less than FF, e.g., "#ADD8E688". You can also use rgb(t(col2rgb("lightblue")), alpha = 128, max = 255) to generate hexadecimal colour strings for a given colour name and opacity value. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Obtaining R-squared from All Possible Combinations of Linear Models Fitted
The dredge function has a `extra` argument to get other statistics: optional additional statistics to be included in the result, provided as functions, function names or a list of such (preferably named or quoted). As with the rank argument, each function must accept as an argument a fitted model object and return (a value coercible to) a numeric vector. This could be, for instance, additional information criteria or goodness-of-fit statistics. The character strings "R^2" and "adjR^2" are treated in a special way and add a likelihood-ratio based R² and modified-R² to the result, respectively (this is more efficient than using r.squaredLR directly). HTH Jan On 17-07-2023 19:24, Paul Bernal wrote: Dear friends, I need to automatically fit all possible linear regression models (with all possible combinations of regressors), and found the MuMIn package, which has the dredge function. This is the dataset I am working with: dput(final_frame) structure(list(y = c(41.9, 44.5, 43.9, 30.9, 27.9, 38.9, 30.9, 28.9, 25.9, 31, 29.5, 35.9, 37.5, 37.9), x1 = c(6.6969, 8.7951, 9.0384, 5.9592, 4.5429, 8.3607, 5.898, 5.6039, 4.9176, 6.2712, 5.0208, 5.8282, 5.9894, 7.5422), x4 = c(1.488, 1.82, 1.5, 1.121, 1.175, 1.777, 1.24, 1.501, 0.998, 0.975, 1.5, 1.225, 1.256, 1.69 ), x8 = c(22, 50, 23, 32, 40, 48, 51, 32, 42, 30, 62, 32, 40, 22), x2 = c(1.5, 1.5, 1, 1, 1, 1.5, 1, 1, 1, 1, 1, 1, 1, 1.5), x7 = c(3, 4, 3, 3, 3, 4, 3, 3, 4, 2, 4, 3, 3, 3)), class = "data.frame", row.names = c(NA, -14L)) I started with the all regressor model, which I called globalmodel as follows: #Fitting Regression model with all possible combinations of regressors options(na.action = "na.fail") # change the default "na.omit" to prevent models globalmodel <- lm(y~., data=final_frame) Then, the following code provides the different coefficients (for regressors and the intercept) for each of the possible model combinations: combinations <- dredge(globalmodel) print(combinations) I would like to retrieve the R-squared generated by each combination, but have not been able to get it thus far. Any guidance on how to retrieve the R-squared from all linear model combinations would be greatly appreciated. Kind regards, Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plotting directly to memory?
Perhaps the ragg package? That has an `agg_capture` device "that lets you access the device buffer directly from your R session." https://github.com/r-lib/ragg HTH, Jan On 28-05-2023 13:46, Duncan Murdoch wrote: Is there a way to open a graphics device that plots entirely to an array or raster in memory? I'd prefer it to use base graphics, but grid would be fine if it makes a difference. For an explicit example, I'd like to do the equivalent of this: filename <- tempfile(fileext = ".png") png(filename) plot(1:10, 1:10) dev.off() library(png) img <- readPNG(filename) unlink(filename) which puts the desired plot into the array `img`, but I'd like to do it without needing the `png` package or the temporary file. A possibly slightly simpler request would be to do this only for plotting text, i.e. I'd like to rasterize some text into an array. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] nth kludge
Hi Avi, list, Below an alternative suggestion: func <- function(a, b, c) { list(a, b, c) } 1:3 |> list(x = _) |> with(func(a, x, b)) Not sure if this is more readable than some of the other solutions, e.g. your solution, but you could make a variant of with more specific for this use case: named <- function(expr, ...) { eval(substitute(expr), list(...), enclos = parent.frame()) } then you can do: 1:3 |> named(func(1, x, mean(x)), x= _) or perhaps you can even simplify further using the same strategy: dot <- function(., expr) { eval(substitute(expr), list(. = .), enclos = parent.frame()) } 1:3 |> dot(func(1, ., mean(.))) This seams simpler than the lambda notation and more general than your solution. Not sure if this has any drawbacks. HTH, Jan On 08-03-2023 21:23, avi.e.gr...@gmail.com wrote: I see many are not thrilled with the concise but unintuitive way it is suggested you use with the new R pipe function. I am wondering if any has created one of a family of functions that might be more intuitive if less general. Some existing pipes simply allowed you to specify where in an argument list to put the results from the earlier pipeline as in: . %>% func(first, . , last) In the above common case, it substituted into the second position. What would perhaps be a useful variant is a function that does not evaluate it's arguments and expects a first argument passed from the pipe and a second argument that is a number like 2 or 3 and a third argument that is the (name of) a function and remaining arguments. The above might look like: . %>% the_nth(2, func, first , last) The above asks to take the new implicitly passed first argument which I will illustrate with a real argument as it would also work without a pipeline: the_nth(piped, 2, func, first, last) So it would make a list out of the remaining arguments that looks like list(first, last) and interpolate piped at position 2 to make list(first, piped, last) and then use something like do.call() do.call(func, list(first, piped, last)) I am not sure if this is much more readable, but seems like a straightforward function to write, and perhaps a decent version could make it into the standard library some year that is general and more useful than the darn anonymous lambda notation. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] foreign package: unable to read S-Plus objects
You could try to see what stattransfer can make of it. They have a free version that imports only part of the data. You could use that to see if stattransfer would help and perhaps discover what format it is in. HTH Jsn On 16-01-2023 23:22, Joseph Voelkel wrote: Dear foreign maintainers and others, I am trying to import a number of S-Plus objects into R. The only way I see how to do this is by using the foreign package. However, when I try to do this I receive an error message. A snippet of code and the error message follows: read.S(file.path(Spath, "nrand")) Error in read.S(file.path(Spath, "nrand")) : not an S object I no longer know the version of S-Plus in which these objects were created. I do know that I have printed documentation, dated July 2001, from S-Plus 6; and that all S-Plus objects were created in the 9/2004 -- 5/2005 range. I am afraid that I simply have S-Plus objects that are not the S version 3 files that the foreign package can read, yes? But I am still hoping that it may be possible to read these in. I am not attaching some sample S-Plus objects to this email, because I believe they will be stripped away as binary files. However, a sample of these files may be found at https://drive.google.com/drive/folders/1wFVa972ciP44Ob2YVWfqk8SGIodzAXPv?usp=sharing (simdat is the largest file, at 469 KB) Thank you for any assistance you may provide. R 4.2.2 Microsoft Windows [Version 10.0.22000.1455] foreign_0.8-83 Joe Voelkel Professor Emeritus RIT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading very large text files into R
You're sure the extra column is indeed an extra column? According to the documentation (https://artefacts.ceda.ac.uk/badc_datadocs/ukmo-midas/RH_Table.html) there should be 15 columns. Could it, for example, be that one of the columns contains records with commas? Jan On 29-09-2022 15:54, Nick Wray wrote: Hello I may be offending the R purists with this question but it is linked to R, as will become clear. I have very large data sets from the UK Met Office in notepad form. Unfortunately, I can’t read them directly into R because, for some reason, although most lines in the text doc consist of 15 elements, every so often there is a sixteenth one and R doesn’t like this and gives me an error message because it has assumed that every line has 15 elements and doesn’t like finding one with more. I have tried playing around with the text document, inserting an extra element into the top line etc, but to no avail. Also unfortunately you need access permission from the Met Office to get the files in question so this link probably won’t work: https://catalogue.ceda.ac.uk/uuid/bbd6916225e7475514e17fdbf11141c1 So what I have done is simply to copy and paste the text docs into excel csv and then read them in, which is time-consuming but works. However the later datasets are over the excel limit of 1048576 lines. I can paste in the first 1048576 lines but then trying to isolate the remainder of the text doc to paste it into a second csv doc is proving v difficult – the only way I have found is to scroll down by hand and that’s taking ages. I cannot find another way of editing the notepad text doc to get rid of the part which I have already copied and pasted. Can anyone help with a)ideally being able to simply read the text tables into R or b)suggest a way of editing out the bits of the text file I have already pasted in without laborious scrolling? Thanks Nick Wray [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to represent tree-structured values
For visualising hierarchical data a treemap can also work well. For example, using the treemap package: n <- 1000 library(data.table) library(treemap) dta <- data.table( level1 = sample(LETTERS[1:5], n, replace = TRUE), level2 = sample(letters[1:5], n, replace = TRUE), level3 = sample(1:9, n, replace = TRUE), event = sample(0:1, n, replace = TRUE) ) tab <- dta[, .(n = .N, rate = sum(event)/.N), by = .(level1, level2, level3)] treemap(tab, index = names(tab)[1:3], vSize = "n", vColor = "rate", type = "value", fontsize.labels = 20*c(1, 0.7, 0)) -- Jan On 30-05-2022 11:40, Jim Lemon wrote: Hi Richard, Thinking about this, you might also find intersectDiagram, also in plotrix, to be useful. Jim On Mon, May 30, 2022 at 4:37 PM Jim Lemon wrote: Hi Richard, Some years ago I had a try at illustrating Multiple Causes of Death (MCoD) data. I settled on what is sometimes called a "sizetree". You can see some examples in the sizetree function help page in "plotrix". Unfortunately I can't use the original data as it was confidential. Jim On Mon, May 30, 2022 at 2:55 PM Richard O'Keefe wrote: There is a kind of data I run into fairly often which I have never known how to represent in R, and nothing I've tried really satisfies me. Consider for example ... - injuries ... - injuries to limbs ... - injuries to extremities ... - injuries to hands - injuries to dominant hand - injuries to non-dominant hand ... ... ... This isn't ordinal data, because there is no "left to right" order on the values. But there IS a "part/whole" order, which an analysis should respect, so it's not pure nominal data either. As one particular example, if I want to tabulate data like this, an occurrence of one value should be counted as an occurrence of *every* superordinate value. Examples of such data include "why is this patient being treated", "what drug is this patient being treated with", "what geographic region is this school from", "what biological group does this insect belong to". So what is the recommended way to represent and the recommended way to analyse such data in R? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting data matrix into submatrices
I have data in a matrix form of order 1826*24 where 1826 represents the days and 24 hourly observations on each data. My objective is to split the matrix into working (Monday to Friday) and non-working (Saturday and Sunday) submatrices. Can anyone help me that how I will do that splitting using R? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] vectorization of loops in R
Have a look at the base functions tapply and aggregate. For example see: - https://cran.r-project.org/doc/manuals/r-release/R-intro.html#The-function-tapply_0028_0029-and-ragged-arrays , - https://online.stat.psu.edu/stat484/lesson/9/9.2, - or ?tapply and ?aggregate. Also your current code seems to contain an error: `s = df[df$y == i,]` should be `s = df$z[df$y == i]` I think. HTH, Jan On 17-11-2021 14:20, Luigi Marongiu wrote: Hello, I have a dataframe with 3 variables. I want to loop through it to get the mean value of the variable `z`, as follows: ``` df = data.frame(x = c(rep(1,5), rep(2,5), rep(3,5)), y = rep(letters[1:5],3), z = rnorm(15), stringsAsFactors = FALSE) m = vector() for (i in unique(df$y)) { s = df[df$y == i,] m = append(m, mean(s$z)) } names(m) = unique(df$y) (m) a b c d e -0.6355382 -0.4218053 -0.7256680 -0.8320783 -0.2587004 ``` The problem is that I have one million `y` values, so the work takes almost a day. I understand that vectorization will speed up the procedure. But how shall I write the procedure in vectorial terms? Thank you __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a hash data structure for R
On 03-11-2021 00:42, Avi Gross via R-help wrote: Finally, someone mentioned how creating a data.frame with duplicate names for columns is not a problem as it can automagically CHANGE them to be unique. That is a HUGE problem for using that as a dictionary as the new name will not be known to the system so all kinds of things will fail. I think you are referring to my remark which was: > However, the data.frame construction method will detect this and > generate unique names (which also might not be what you want): I didn't say this means that duplicate names aren't a problem; I just mentioned the the behaviour is different. Personally, I would actually prefer the behaviour of list (keep the duplicated name) with a warning. Most of the responses seem to assume that the OP actually wants a hash table. Yes, he did ask for that and for a hash table an environment (with some work) would be a good option. But in many cases, where other languages would use a hash-table-like object (such as a dict) in R you would use other types of objects. Furthermore, for many operations where you might use hash tables to implement the operation, R has already built in options, for example %in%, match, duplicated. These are also vectorised; so two vectors: one with keys and one with values might actually be faster than an environment in some use cases. Best, Jan And there are also packages for many features like sets as well as functions to manipulate these things. -Original Message- From: R-help On Behalf Of Bill Dunlap Sent: Tuesday, November 2, 2021 1:26 PM To: Andrew Simmons Cc: R Help Subject: Re: [R] Is there a hash data structure for R Note that an environment carries a hash table with it, while a named list does not. I think that looking up an entry in a list causes a hash table to be created and thrown away. Here are some timings involving setting and getting various numbers of entries in environments and lists. The times are roughly linear in n for environments and quadratic for lists. vapply(1e3 * 2 ^ (0:6), f, L=new.env(parent=emptyenv()), FUN.VALUE=NA_real_) [1] 0.00 0.00 0.00 0.02 0.03 0.06 0.15 vapply(1e3 * 2 ^ (0:6), f, L=list(), FUN.VALUE=NA_real_) [1] 0.01 0.03 0.15 0.53 2.66 13.66 56.05 f function(n, L, V = sprintf("V%07d", sample(n, replace=TRUE))) { system.time(for(v in V)L[[v]]<-c(L[[v]],v))["elapsed"] } Note that environments do not allow an element named "" (the empty string). Elements named NA_character_ are treated differently in environments and lists, neither of which is great. You may want your hash table functions to deal with oddball names explicitly. -Bill On Tue, Nov 2, 2021 at 8:52 AM Andrew Simmons wrote: If you're thinking about using environments, I would suggest you initialize them like x <- new.env(parent = emptyenv()) Since environments have parent environments, it means that requesting a value from that environment can actually return the value stored in a parent environment (this isn't an issue for [[ or $, this is exclusively an issue with assign, get, and exists) Or, if you've already got your values stored in a list that you want to turn into an environment: x <- list2env(listOfValues, parent = emptyenv()) Hope this helps! On Tue, Nov 2, 2021, 06:49 Yonghua Peng wrote: But for data.frame the colnames can be duplicated. Am I right? Regards. On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan wrote: True, but in a lot of cases where a python user might use a dict an R user will probably use a list; or when we are talking about arrays of dicts in python, the R solution will probably be a data.frame (with each dict field in a separate column). Jan On 02-11-2021 11:18, Eric Berger wrote: One choice is new.env(hash=TRUE) in the base package On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng wrote: I know this is a newbie question. But how do I implement the hash structure which is available in other languages (in python it's dict)? I know there is the list, but list's names can be duplicated here. x <- list(x=1:5,y=month.name,x=3:7) x $x [1] 1 2 3 4 5 $y [1] "January" "February" "March" "April" "May" "June" [7] "July" "August""September" "October" "November" "December" $x [1] 3 4 5 6 7 Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list
Re: [R] Is there a hash data structure for R
Yes. A data.frame is basically a list where all elements are vectors of the same length. So this issue also exists in a data.frame. However, the data.frame construction method will detect this and generate unique names (which also might not be what you want): > data.frame(a=1:3, a=1:3) a a.1 1 1 1 2 2 2 3 3 3 But still with a little effort you can still create a data.frame with multiple columns with the same name. But as Duncan Murdoch mentions you can usually control for that. Best, Jan On 02-11-2021 11:32, Yonghua Peng wrote: But for data.frame the colnames can be duplicated. Am I right? Regards. On Tue, Nov 2, 2021 at 6:29 PM Jan van der Laan <mailto:rh...@eoos.dds.nl>> wrote: True, but in a lot of cases where a python user might use a dict an R user will probably use a list; or when we are talking about arrays of dicts in python, the R solution will probably be a data.frame (with each dict field in a separate column). Jan On 02-11-2021 11:18, Eric Berger wrote: > One choice is > new.env(hash=TRUE) > in the base package > > > > On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng mailto:y...@pobox.com>> wrote: > >> I know this is a newbie question. But how do I implement the hash structure >> which is available in other languages (in python it's dict)? >> >> I know there is the list, but list's names can be duplicated here. >> >>> x <- list(x=1:5,y=month.name <http://month.name>,x=3:7) >> >>> x >> >> $x >> >> [1] 1 2 3 4 5 >> >> >> $y >> >> [1] "January" "February" "March" "April" "May" "June" >> >> [7] "July" "August" "September" "October" "November" "December" >> >> >> $x >> >> [1] 3 4 5 6 7 >> >> >> >> Thanks a lot. >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html <http://www.R-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there a hash data structure for R
True, but in a lot of cases where a python user might use a dict an R user will probably use a list; or when we are talking about arrays of dicts in python, the R solution will probably be a data.frame (with each dict field in a separate column). Jan On 02-11-2021 11:18, Eric Berger wrote: One choice is new.env(hash=TRUE) in the base package On Tue, Nov 2, 2021 at 11:48 AM Yonghua Peng wrote: I know this is a newbie question. But how do I implement the hash structure which is available in other languages (in python it's dict)? I know there is the list, but list's names can be duplicated here. x <- list(x=1:5,y=month.name,x=3:7) x $x [1] 1 2 3 4 5 $y [1] "January" "February" "March" "April" "May" "June" [7] "July" "August""September" "October" "November" "December" $x [1] 3 4 5 6 7 Thanks a lot. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Getting different results with set.seed()
What you could also try is check if the self coded functions use the random generator when defining them: starting_seed <- .Random.seed Step 1. Self-coded functions (these functions generate random numbers as well) # check if functions have modified the seed: all.equal(starting_seed, .Random.seed) Step 2: set.seed (123) What has also happened to me is that some of the functions I called had their own random number generator independent of that of R. For example using one in C/C++. Do your functions do stuff in parallel? For example using the parallel or snow package? In that case you also have to set the seed in the parallel workers. Best, Jan On 19-08-2021 11:25, PIKAL Petr wrote: Hi Did you try different order? Step 2: set.seed (123) Step 1. Self-coded functions (these functions generate random numbers as well) Step 3: Call those functions. Step 4: model results. Cheers Petr. And BTW, do not use HTML formating, it could cause problems in text only list. From: Shah Alam Sent: Thursday, August 19, 2021 10:10 AM To: PIKAL Petr Cc: r-help mailing list Subject: Re: [R] Getting different results with set.seed() Dear Petr, It is more than 2000 lines of code with a lot of functions and data inputs. I am not sure whether it would be useful to upload it. However, you are absolutely right. I used Step 1. Self-coded functions (these functions generate random numbers as well) Step 2: set.seed (123) Step 3: Call those functions. Step 4: model results. I close the R session and run the code from step 1. I get different results for the same set of values for parameters. Best regards, Shah On Thu, 19 Aug 2021 at 09:56, PIKAL Petr <mailto:petr.pi...@precheza.cz> wrote: Hi Please provide at least your code preferably with some data to reproduce this behaviour. I wonder if anybody could help you without such information. My wild guess is that you used set.seed(1234) some code the code used again in which case you have to expect different results. Cheers Petr -Original Message- From: R-help <mailto:r-help-boun...@r-project.org> On Behalf Of Shah Alam Sent: Thursday, August 19, 2021 9:46 AM To: r-help mailing list <mailto:r-help@r-project.org> Subject: [R] Getting different results with set.seed() Dear All, I was using set.seed to reproduce the same results for the discrete event simulation model. I have 12 unknown parameters for optimization (just a little background). I got a good fit of parameter combinations. However, when I use those parameters combinations again in the model. I am getting different results. Is there any problem with the set.seed. I assume the set.seed should produce the same results. I used set.seed(1234). Best regards, Shah [[alternative HTML version deleted]] __ mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mean absolute error from data matrix
I have data matrix of order 24*2192 where 2192 are the days and 24 are hour's of a single day,so simple words I have 2192 days and each day having 24 observations.the data matrix is divided into two matrix,the ist matrix is of order 24*1827 and second is of order 24*365. Suppose the ist column of the second matrix is Sunday then we choose each column of the first matrix having Sunday. The takeing the first column of data matrix is converted into vector and all the Sunday columns are converted into vectors. Then we calculate mean absolute errors for different pairs of the first vector of the second matrix with each vector of first matrix. Similarly process is repeated for the rest of the week days. It clear that such process is quite time consuming and hard if perform manually. Can any one provides the easiest way to do such problem.Regard Sent from Yahoo Mail on Android [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Read fst files
read_fst is from the package fst. The fileformat fst uses is a binary format designed to be fast readable. It is a column oriented format and compressed. So, to be able to work fst needs access to the file itself and wont accept a file connection as functions like read.table an variants accept. Also, because it is a binary compressed format using a compression method that is fast to read, compressing also to zip seems to defeat the purpose of fst. HTH, Jan On 09-06-2021 15:28, Duncan Murdoch wrote: On 09/06/2021 9:12 a.m., Jeff Reichman wrote: Duncan Yea that will work. It appears to be related to setting my working dir, for what ever reason neither seem to work (1) knitr::opts_knit$set(root.dir ="~/My_Reference_Library/Regression") # from R Notebook or (2) setwd("C:/Users/reichmaj/Documents/My_Reference_Library/Regression") # from R chunk So it appears I can either (as you suggested) use two steps or combine but I need to enter the full path. Why other file types don't seem to need the full path ? You need to read the documentation for read_fst() to find what it needs. If it doesn't explain this, then you should report the issue to its author. myObject <- read_fst(unz("C:/Users/reichmaj/Documents/My_Reference_Library/Regression/Datasest.zip", filename = "myFile.fst")) Thank you. I guess just one of those R things No, it's a read_fst() thing. Duncan Murdoch __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] What is an alternative to expand.grid if create a long vector?
This is an optimisation problem that you are trying to solve using a grid search. There are numerous methods for optimisation, see https://cran.r-project.org/web/views/Optimization.html for and overview for R. It really depends on the exact problem what method is appropriate. As Petr said helping you decide which method to use does not fit on this list. Perhaps de overview linked to above (and the terms 'grid search' and 'optimization') can help you find an appropriate method. HTH, Jan On 20-04-2021 09:02, PIKAL Petr wrote: Hi Keep your mails on the list. Actually you did not say much about your data and the way how do you want to model them. There are plenty of modelling functions in R starting with e.g. lm but I am not aware of a procedure in which you just design your explanatory variables to set plausible model. But I am not expert in statistics and this list is not ment for solving statistical problems. Cheers Petr From: Shah Alam Sent: Monday, April 19, 2021 5:20 PM To: PIKAL Petr Subject: Re: [R] What is an alternative to expand.grid if create a long vector? Dear Petr, Thanks for your response. I am designing a model with 10 unknown parameters. generating the combination of unknown parameters will be used in the model to estimate the set of vectors that fits well to actual data. Is there any other was to do it? I also used randomLHS function from lhs package. But, it did not serve the purpose. Best regards, Shah Alam On Mon, 19 Apr 2021 at 16:07, PIKAL Petr mailto:petr.pi...@precheza.cz> > wrote: Hi Actually expand.grid produces data frame and not vector. And dimension of the data frame is "big" dim(A) [1] 1 4 str(A) 'data.frame': 1 obs. of 4 variables: $ Var1: num 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.01 ... $ Var2: num 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 1e-04 ... $ Var3: num 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 ... $ Var4: num 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 0.12 ... - attr(*, "out.attrs")=List of 2 ..$ dim : int [1:4] 100 100 100 100 ..$ dimnames:List of 4 .. ..$ Var1: chr [1:100] "Var1=0.001" "Var1=0.002" "Var1=0.003" "Var1=0.004" ... .. ..$ Var2: chr [1:100] "Var2=0.000100" "Var2=0.0001090909" "Var2=0.0001181818" "Var2=0.0001272727" ... .. ..$ Var3: chr [1:100] "Var3=0.380" "Var3=0.3804040" "Var3=0.3808081" "Var3=0.3812121" ... .. ..$ Var4: chr [1:100] "Var4=0.120" "Var4=0.1206061" "Var4=0.1212121" "Var4=0.1218182" ... in case of 4 sequences 1e8 rows, 4 columns in case of 10 sequences 1e20 rows and 10 columns in your last example 1.4e8 rows and 10 columns which probably cross the memory capacity of your PC. Maybe you could increase memory of you PC. If I am correct to store the first you need about 3.2GB, to strore the last 11.2 GB. May I ask what you want to do with such a big object? Cheers Petr -Original Message- From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Shah Alam Sent: Monday, April 19, 2021 2:36 PM To: r-help mailing list mailto:r-help@r-project.org> > Subject: [R] What is an alternative to expand.grid if create a long vector? Dear All, I would like to know that is there any problem in *expand.grid* function or it is a limitation of this function. I am trying to create a combination of elements using expand.grid function. A <- expand.grid( c(seq(0.001, 0.1, length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out = 100)), c(seq(0.12, 0.18, length.out = 100))) Four combinations work fine. However, If I increase the combinations up to ten. The following error appears. A <- expand.grid( c(seq(0.001, 1, length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.38, 0.42, length.out = 100)), c(seq(0.12, 0.18, length.out = 100)), c(seq(0.01, 0.04, length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.0001, 0.001, length.out = 100)), c(seq(0.001, 0.01, length.out = 100)), c(seq(0.01, 0.3, length.out = 100)) ) *Error in rep.int <http://rep.int> <http://rep.int>(rep.int <http://rep.int> <http://rep.int>(seq_len(nx), rep.int <http://rep.int> <http://rep.int>(rep.fac, nx)), orep) : invalid 'times' value* After reducing the length to 10. It produced a different type of error A <- expand.grid( c(seq(0.001, 0.005, length.out = 10)), c(seq(0.0001, 0.0005, length.out = 10)), c(seq(0.38, 0.42, length.out = 5)), c(seq(0.12, 0.18, length.out = 7)), c(seq(0.01, 0.04, length.out = 5)), c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.0001, 0.001, length.out = 10)), c(seq(0.001, 0.01, length.out = 10)), c(seq(0.1, 0.8, length.out = 8)) ) *Error: canno
Re: [R] What is an alternative to expand.grid if create a long vector?
But even if you could have a generator that is superefficient and perform an calculation that is superfast the number of elements is ridiculously large. If we take 1 nanosec per element; the computation would still take: > (100^10)*1E-9/3600 [1] 2778 hours, or > (100^10)*1E-9/3600/24/365 [1] 3170.979 years. -- Jan On 20-04-2021 03:46, Avi Gross via R-help wrote: Just some thoughts I am considering about the issue of how to make giant objects in memory without making them giant or all in memory. As stupid as this sounds, when things get really big, it can mean not only processing your data in smaller amounts but using other techniques than asking expand.grid to create all possible combinations in advance. Some languages like python allow generators that yield one item at a time and are called until exhausted, which sounds more like your usage. A single function remains resident in memory and each time it is called it uses the resident values in a calculation and returns the next. That approach may not work well with the way expand.grid works. So a less efficient way would be to write your own deeply nested loop that generates one set of ten or so variables each time through the deepest nested loop that you can use one at a time. Alternatively, you can use such a loop to write a line at a time in something like a .CSV format and later read N lines at a time from the file or even have multiple programs work in parallel by taking their own allocations after ignoring the lines not meant for them, or some other method. Deeply nested loops in R tend to be slow, as I have found out, which is indeed why I switched to using pmap() on a data.frame made using expand.grid first. But if your needs are exorbitant and you have limited memory, Can you squeeze some memory out of your design? Your data seems highly repetitive and if you really want to store something like this in a column: c(seq(0.001, 1, length.out = 100)) The size of that, for comparison, is: object.size(seq(0.001, 1, length.out = 100)) 848 bytes So it is 8 bytes per number plus some overhead. Then consider storing something like that another way. First, the c() wrapper around the above is redundant, albeit harmless. Why not store this: 1L:100L object.size(1L:100L) 448 bytes So, four bytes per number plus some overhead. That stores integers between 1 and 100 and in your case that means that later you can divide by a thousand or so to get the number you want each time but not store a full double-precision number. And if you use factors, it may take less space. I note some of your other values pick different starting and ending points but in all cases you ask for 100 equally-spaced values to be calculated by seq() which is fine but you could simply record a factor with umpteen specific values as either doubles or integers and if expand.grid honors that, it would use less space in any final output. My experiments (not shown here) suggest you can easily cut sizes in half and perhaps more with judicious usage. Perhaps finding or writing a more efficient loop in a C or C++ function would allow a way to loop through all possibilities more efficiently and provide a function for it to call on each iteration. Depending on your need, that can do a calculation using local variables and perhaps add a line to an output file, or add another set of values to a vector or other data structure that gets returned at the end of processing. One possibility to consider is using an on-line resource, perhaps paying a fee, that will run your R program for you in an environment with more allowed resources like memory: https://rstudio.cloud/ Some of the professional options allow 8 GB of memory and perhaps 4 CPU. You can, of course, configure your own machine to have more memory or perhaps allocate lots more swap space and allow your process to abuse it. There are many possible solutions but also consider if the sizes and amounts you are working on are realistic. I worked on a project a while ago where I generated a huge amount of instances with 500 iterations per instance and was asked to bump that up to 10,000 per instance (20 times as much) just to show the results were similar and that 500 had been enough. It ran for DAYS and luckily the rest of the project went back to more manageable numbers. So, back to your scenario, I wonder if the regularity of your data would allow interesting games to be played. Imagine smaller combinations of say 10 levels each and for each row in the resulting data.frame, expand that out again so the number 2,3,4 (using just three for illustration) becomes (2:29, 3:39, 4:49) and is given to expand.grid to make a smaller local one-use expansion table to use. Your original giant problem is converted to making a modest table that for each row expands to a second modest table that is used and immediately discarded and replaced by a s
[R] Check accuracy of the model
- Hi, hope that you will be fine, I have a problem with functional time series, I am working with the hourly electricity spot price data, due to the large dimensionality I convert the discrete data into functional data. The model I use the functional autoregressive model of order more than one, as per my knowledge there no R package available to deal with such model, so I apply an alternative method, using the functional principal components (FPC's) as dimensional reduction, utilizing the associated principal components for the forecasting through multivariate time series model. Then I convert these forecast scores into functional curves through Karhunen-Loeve decomposition into a functional form, in such a way I obtained a forecast of each day as a single curve. Know, to check the accuracy of the model I want to calculate percentage mean square error or mean absolute error. know my problem start from here, So I want to reverse back each curve into 24 discrete points, is there is any package in R which is helpful in dealing with such a problem. I will be waiting for your fruitful reply in this regard. - [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] /usr/local/lib/R/site-library is not writable
I would actually go a step in the other direction: per project libraries. For example by adding a .Rprofile file to your project directory. This ensures that everybody working on a project uses the same version of the packages (even on different machines e.g. on shared folders). This can give issues when a new version of R arrives, but that is usually easy to solve. Either hard code the path to the old R-version or decide to update all packages in a project to the new R-version (and test that everything is still working ok). We have the most often used packages installed centrally on the server/network, so I actually usually end up with a mixture of central, personal and project libraries. Theory vs practice. HTH, Jan On 08-04-2021 02:58, Dirk Eddelbuettel wrote: Hi Gene, "It's complicated". (Not really, but listen for a sec...) We need to ship a default policy that makes sense for all / most situations. So - users cannot write into /usr/local/lib/R/site-library -- unless they are set up to, but adding them to the 'group' that owns that directory - root can (but ideally one should not run as root as one generally does not now what code you might get slipped in a tar.gz); but root can enable users - so we recommend letting (some or all) users write there by explicitly adding them to an appropriate group. Personally, I do not think personal libraries are a good idea on shared machines because you can end up with a different set of package (versions) than your colleague on the same machine. And or you running shiny from $HOME have different packages than shiny running as server. And on and on. Other people differ, and that is fine. If one wants personal libraries one can. I must have explained the reasoning and fixes a dozen times each on r-sig-debian (where you could have asked this too) and StackOverflow. At least the latter can be searched so look at this set: https://stackoverflow.com/search?q=user%3Ame+is%3Aanser+%2Fusr%2Flocal%2Flib%2FR%2Fsite-library Happy to take it offline too, and who knows, we even get to meet for a coffee one of these days. Hope this helps, Dirk __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] forecast accuracy
I am new in the functional time series, my question may be stupid as I am new, I am functional forecasting one year a head, Know I want to check the forecast accuracy by calculating the mean absolute percentage error, but I am unable to due this R, please help me or suggest me any link which help me to solve my problem, Regard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] forecast accuracy
I am working in the functional time series, I obtain the one year ahead forecast in the functional format, Know i want to forecast accuracy for example mean absolute percentage error in R, please help how i do this in R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] forecast accuracy
I am working in the functional time series, I obtain the one year ahead forecast in the functional format, Know i want to forecast accuracy for example mean absolute percentage error in R, please help how i do this in R [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Col names in a data frame
it looks to me that the names are cranked through make.names for data frames case while that doesn't happen for matrices. Peeking into the `colnames<-` code supports this idea, but that in turn uses `names<-` which is a primitive and so defies further easy peeking. The data.frame function provides the check.names parameter to switch this on / off, but for other classes this checking doesn't seem to be provided. Perhaps the idea behind this discrepancy is to enable the use of the $ operator to access columns of the data frame, while that's not possible for matrices anyway. (Personally, I don't find the results of make.names that useful, though, and I tend to sanitise them myself when working with data frames with unwieldy column names). Best regards, Jan On Thu, Jan 21, 2021 at 03:58:44PM -0500, Bernard McGarvey wrote: > Here is an example piece of code to illustrate an issue: > > rm(list=ls()) # Clear Workspace > # > Data1 <- matrix(data=rnorm(9,0,1),nrow=3,ncol=3) > Colnames1 <- c("(A)","(B)","(C)") > colnames(Data1) <- Colnames1 > print(Data1) > DataFrame1 <- data.frame(Data1) > print(DataFrame1) > colnames(DataFrame1) <- Colnames1 > print(DataFrame1) > > The results I get are: > > (A)(B)(C) > [1,] 0.4739417 1.3138868 0.4262165 > [2,] -2.1288083 1.0333770 1.1543404 > [3,] -0.3401786 -0.7023236 -0.2336880 > X.A. X.B. X.C. > 1 0.4739417 1.3138868 0.4262165 > 2 -2.1288083 1.0333770 1.1543404 > 3 -0.3401786 -0.7023236 -0.2336880 > (A)(B)(C) > 1 0.4739417 1.3138868 0.4262165 > 2 -2.1288083 1.0333770 1.1543404 > 3 -0.3401786 -0.7023236 -0.2336880 > > so that when I make the matrix with headings the parentheses are replaced by > periods but I can add them after creating the data frame and the column > headings are correct. > > Any ideas on why this occurs? > > Thanks > > > Bernard McGarvey > Director, Fort Myers Beach Lions Foundation, Inc. > Retired (Lilly Engineering Fellow). > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with connection issue for R (just joined, leading R for our agency)
Alejandra, If it was initially working ok, I would first check with the IT department if there has been a change to the configuration of the firewall, virus scanners, file system etc. as these can affect the performance of R-studio. R-studio uses a client-server setup on your machine, so a firewall/malware scanner inspecting all communication between R-studio and the R session can have a large effect. If you can't find the problem, you are probably better of asking at the R-studio fora. A similar question was asked a while back: https://community.rstudio.com/t/rstudio-suddenly-slow-processing/4959; perhaps some of the solutions proposed also work for you. As an alternative to Emacs/R-studio you could also have a look at visual studio code. It has a R-plugin. If your organisation is microsoft oriented there might already be a chance that it is available. You need a relatively recent version though. HTH, Jan On 14-12-2020 12:54, Michael Dewey wrote: Just to add to Petr's comment there are other basic editors with syntax highlighting like Notepad++ which are also OK if you want a fairly minimalist approach. Michael On 14/12/2020 08:16, PIKAL Petr wrote: Hallo Alejandra Although RStudio and ESS could help with some automation (each with its own way), using R alone is not a big problem, especially if you are not familiar with Emacs basics and perceiving RStudio issues. I use R with simple external editor - it could be notepad but I could recommend TINN-R https://sourceforge.net/projects/tinn-r/ which has syntax highlighting and works smoothly if R is console is set to multiple windows. It is also quite easy to manage. Good luck with R. Cheers Petr -Original Message- From: R-help On Behalf Of Alejandra Barrio Gorski Sent: Tuesday, December 8, 2020 7:48 PM To: R-help@r-project.org Subject: [R] Help with connection issue for R (just joined, leading R for our agency) Dear fellow R users, Greetings, I am new to this list. I joined because I am pioneering the use of R for the agency I work for. I essentially work alone and would like to reach out for help on an issue I have been having. Here it is: - From one day to the next, my RStudio does not execute commands when I press ctrl + enter. Nothing happens, and then after a few minutes out of nowhere, it runs everything at once. This makes it very hard to do my work. - I tried uninstalling and re-installing both R and Rstudio, but the error comes up again. I tested commands on my R program alone, and it works fine there. It could be the way that Rstudio connects to R. - I am on a Windows 10 computer. I work for a government agency so there may be a few firewall/virus protection issues. I would love any pointers. Thank you, Alejandra -- *Alejandra Barrio* Linkedin <https://www.linkedin.com/in/alejandra-barrio/> | Website <https://www.ocf.berkeley.edu/~alejandrabarrio/> MPP | M.A., International and Area Studies University of California, Berkeley [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help in R code
Good morning, Please help me to code this code in R. I working in the multivariate time series data, know my objective is that to one year forecast of the hourly time series data, using first five as a training set and the remaining one year as validation. For this I transform the the data into functional data through Fourier basis functional, apply functional principle components as dimensional reduction explaining a specific amount of variation , using the corresponding functional principle components scores. I use the VAR model on those FPCscores for forecasting one day ahead forecast, know my problem is that i choose four Fpc scores which give only four value in a single day, I want the forecast for 24 hours not only 4, and then i want to transform it back to the original functional data. for the understanding i am sharing my code (1) transform of the multivariate time series data in functional data(2) the functional principle components and the corresponding scores(3) I use functional final prediction error for the selection of the parameters on the VAR model(4) Using VAR for the analysis and forecasting .(1) nb = 23 # number of basis functions for the data fbf = create.fourier.basis(rangeval=c(0,1), nbasis=nb) # basis for data args=seq(0,1,length=24) fdata1=Data2fd(args,y=t(mat),fbf) # functions generated from discretized y(2) ffpe = fFPE(fdata1, Pmax=10) d.hat = ffpe[1] #order of the model p.hat = ffpe[2] #lag of the model (3) n = ncol(fdata1$coef) D = nrow(fdata1$coef) #center the data mu = mean.fd(fdata1) data = center.fd(fdata1) #fPCA fpca = pca.fd(data,nharm=D) scores = fpca$scores[,1:d.hat](4) # to avoid warnings from vars predict function below colnames(scores) <- as.character(seq(1:d.hat)) VAR.pre= predict(VAR(scores, p.hat), n.ahead=1, type="const")$fcst after this I need help first how to transform this into original Functional data and to obtain the for for each 24 hours (mean one day forecast) and to how to generalize the result for one year. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help in Coding
Good morning dear administrators, Please help me to code this code in R. I working in the multivariate time series data, know my objective is that to one year forecast of the hourly time series data, using first five as a training set and the remaining one year as validation. For this I transform the the data into functional data through Fourier basis functional, apply functional principle components as dimensional reduction explaining a specific amount of variation , using the corresponding functional principle components scores. I use the VAR model on those FPCscores for forecasting one day ahead forecast, know my problem is that i choose four Fpc,s which give only four value in a single day, I want the forecast for 24 hours not only 4, and then i want to transform it back to the original functional data. for the understanding i am sharing my code (1) transform of the multivariate time series data in functional data(2) the functional principle components and the corresponding scores(3) I use functional final prediction error for the selection of the parameters on the VAR model(4) Using VAR for the analysis and forecasting .(1) nb = 23 # number of basis functions for the data fbf = create.fourier.basis(rangeval=c(0,1), nbasis=nb) # basis for data args=seq(0,1,length=24) fdata1=Data2fd(args,y=t(mat),fbf) # functions generated from discretized y(2) ffpe = fFPE(fdata1, Pmax=10) d.hat = ffpe[1] #order of the model p.hat = ffpe[2] #lag of the model (3) n = ncol(fdata1$coef) D = nrow(fdata1$coef) #center the data mu = mean.fd(fdata1) data = center.fd(fdata1) #fPCA fpca = pca.fd(data,nharm=D) scores = fpca$scores[,1:d.hat](4) # to avoid warnings from vars predict function below colnames(scores) <- as.character(seq(1:d.hat)) VAR.pre= predict(VAR(scores, p.hat), n.ahead=1, type="const")$fcst after this iIneed help first how i transform this into original Functional data and to obtain the for for each 24 hours (mean one day forecast) and to how to generalize the result for one year. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installing a Package
Hello, I am working in the nonparametric functional data analysis, i stack in a simple problem is that i am going to install a package in the name of nfda which is not present in the R Cran, know how i am going to install this package in R studio from archives or some thing else. please guide me i am just a beginner in R, Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help in R code
Hello , i am working in the functional time series using themultivariate time series data(hourly time series data). Sir i am usingFAR model more than one order for which no statistical package is available inR, so for this i convert my data into functional form and obtained thefunctional principle component and from those FPCA i extract theircorresponding FPCscores. Know i use the VAR model on those FPCscores forthe forecasting of each 24 hours through the VAR model, but the VAR give me theforecasted value for all 23hours when i put phat=23, but whenever i putphat=24 i.e want to predict each 24 hours its give the results in the form ofNA. the code is given below fdata<- function(mat){ nb = 27 # number of basis functions for the data fbf = create.fourier.basis(rangeval=c(0,1), nbasis=nb) #basis for data args=seq(0,1,length=24) fdata1=Data2fd(args,y=t(mat),fbf) # functions generatedfrom discretized y return(fdata1) } prediction.ffpe = function(fdata1){ n = ncol(fdata1$coef) D = nrow(fdata1$coef) #center the data #mu = mean.fd(fdata1) data = center.fd(fdata1) #ffpe = fFPE(fdata1, Pmax=10) #p.hat = ffpe[2] #order of the model d.hat=23 p.hat=6 #fPCA fpca = pca.fd(data,nharm=D, centerfns=TRUE) scores = fpca$scores[,0:d.hat] # to avoid warnings from vars predict function below colnames(scores) <- as.character(seq(1:d.hat)) VAR.pre= predict(VAR(scores, p.hat), n.ahead=1,type="const")$fcst } kindly guide me that how can i solve out my problem or whaterror i doing. THANKS [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help in R code
Hello , i am working in the functional time series using the multivariate time series data(hourly time series data). Sir i am using FAR model more than one order for which no statistical package is available in R, so for this i convert my data into functional form and obtained the functional principle component and from those FPCA i extract their corresponding FPCscores. Know i use the VAR model on those FPCscores for the forecasting of each 24 hours through the VAR model, but the VAR give me the forecasted value for all 23hours when i put phat=23, but whenever i put phat=24 i.e want to predict each 24 hours its give the results in the form of NA. the code is given below fdata<- function(mat){ nb = 27 # number of basis functions for the data fbf = create.fourier.basis(rangeval=c(0,1), nbasis=nb) # basis for data args=seq(0,1,length=24) fdata1=Data2fd(args,y=t(mat),fbf) # functions generated from discretized y return(fdata1)}prediction.ffpe = function(fdata1){ n = ncol(fdata1$coef) D = nrow(fdata1$coef) #center the data #mu = mean.fd(fdata1) data = center.fd(fdata1) #ffpe = fFPE(fdata1, Pmax=10) #p.hat = ffpe[2] #order of the model d.hat=23 p.hat=6 #fPCA fpca = pca.fd(data,nharm=D, centerfns=TRUE) scores = fpca$scores[,0:d.hat] # to avoid warnings from vars predict function below colnames(scores) <- as.character(seq(1:d.hat)) VAR.pre= predict(VAR(scores, p.hat), n.ahead=1, type="const")$fcst } kindly guide me that how can i solve out my problem or what error i doing. THANKS [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Would Like Some Advise
to the McAfee subscription my wife has for other systems in the house. Note, while R is my primary computational world, by far, I do run Anaconda Python 3 from time to time. It can be useful for preparing data for consumption by R, given raw files, many with glitches and mistakes. But with the data.table package and other packages in R, I'm finding that's less and less true. The biggest headache of Python is that you need to keep its libraries updated. I also have used Python some times just to access MATPLOTLIB. I prefer R, though, because, like MATLAB, its numerics are better than Python's NUMPY and SCIPY. As I said, I don't know Mac at all well. But I do know that, when Mac released a new version, somehow the colleagues about me would often degenerate into a couple of days of grumbling and meeting with each other about how they got past or around some stumbling point when updating their systems. Otherwise people seem to like them a lot. I think all operating systems are deals with the Devil. It's what you put up with and deal with. As you can see, I opted to go the Windows route again, for probably the next 10 years. YMMV. - Jan On Sat, Aug 29, 2020, at 06:00, r-help-requ...@r-project.org wrote: > From: "Philip" > To: "r-help" > Subject: [R] Would Like Some Advise > Message-ID: <1157A76A248944878C040D1FE0AE725C@OWNERPC> > Content-Type: text/plain; charset="utf-8" > > I need a new computer. have a friend who is convinced that I have an > aura about me that just kills electronic devices. > > Does anyone out there have an opinion about Windows vs. Linux? > > I’m retired so this is just for my own enjoyment but I’m crunching some > large National Weather Service files and will move on to baseball data > and a few other things. I’d like some advise about how much RAM and > stuff like that. I understand there is something called zones of > computer memory. Can someone direct me to a good source so I can learn > more? I really don’t understand stuff like this. Does anyone think I > need to upgrade my wifi? > > Thanks, > Philip __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to read a file containing two types of rows - (for the Netflix challenge data format)
With the *data.table* package, *R* can use *fread* as follows: > grab<- function(file) > { > fin<- fread(file=file, > sep=NULL, > dec=".", > quote="", nrows=Inf, header=FALSE, > stringsAsFactors=FALSE, verbose=FALSE, > col.names=c("record"), > check.names=FALSE, fill=FALSE, blank.lines.skip=FALSE, > showProgress=TRUE, > data.table=FALSE, skip=0, > nThread=2, logical01=FALSE, keepLeadingZeros=FALSE) > cat(sprintf("Read '%s'.\n", file)) > # > substance<- apply(X=fin, MARGIN=1, FUN=function(r) chartr(",", "\t", r[1])) > cat(sprintf("Translated '%s'.\n", file)) > D<- fread(text=substance, > sep="\t", > dec=".", > quote="", nrows=Inf, header=FALSE, > stringsAsFactors=FALSE, verbose=FALSE, > col.names=c("ip", "valid.hits", "err.hits", "megabytes"), > check.names=FALSE, fill=FALSE, blank.lines.skip=FALSE, > showProgress=TRUE, > data.table=FALSE, skip=0, > nThread=2, logical01=FALSE, keepLeadingZeros=FALSE) > cat(sprintf("Parsed '%s'.\n", file)) > ip<- D$ip > withinBlock<- sapply(X=ip, FUN=function(s) as.integer((strsplit(x=s, > split=".", fixed=TRUE)[[1]])[4])) > D$within.block<- withinBlock > return(D) > } > In short, one pass pulls in all the records into an internal structure, which can be edited or manipulated at will, and then a second call to *fread* parses it properly. *fread *is fast, even for big datasets. -- Jan Galkowski https://www.linkedin.com/in/deepdevelopment member, ... American Statistical Association ... International Society for Bayesian Analysis ... Ecological Society of America ... International Association of Survey Statisticians ... American Association for the Advancement of Science ... TeX Users Group (pronouns: *he, him, his*) *Keep your energy local*. --John Farrell, *ILSR <http://ilsr.org/>* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extracting a particular column from list
Hi. How to extract a column from the list.. I will be thanks full.. Sent from Yahoo Mail on Android [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] spurious locking of packages
I have been having a problem installing binary packages on Windows, since 3.6.x hit the streets. I am using the > > INSTALL_opts = c('--no-lock') > option, but it occurs nevertheless. My habit is to install an update of R (latest, 3.6.2), then run update.packages(.): > > trying URL > 'https://cran.cnr.berkeley.edu/bin/windows/contrib/3.6/zoib_1.5.4.zip' > Content type 'application/zip' length 350788 bytes (342 KB) > downloaded 342 KB > > package ‘elasticnet’ successfully unpacked and MD5 sums checked > package ‘ellipse’ successfully unpacked and MD5 sums checked > package ‘elliptic’ successfully unpacked and MD5 sums checked > package ‘EMCluster’ successfully unpacked and MD5 sums checked > package ‘EMD’ successfully unpacked and MD5 sums checked > Warning: cannot remove prior installation of package ‘EMD’ > Warning in file.copy(savedcopy, lib, recursive = TRUE) : > problem copying C:\Program > Files\R\R-2.13.1\library\00LOCK\EMD\libs\x64\EMD.dll to C:\Program > Files\R\R-2.13.1\library\EMD\libs\x64\EMD.dll: Permission denied > Warning: restored ‘EMD’ > package ‘emdbook’ successfully unpacked and MD5 sums checked > package ‘emdist’ successfully unpacked and MD5 sums checked > package ‘emmeans’ successfully unpacked and MD5 sums checked > package ‘emoa’ successfully unpacked and MD5 sums checked > Error in unpackPkgZip(foundpkgs[okp, 2L], foundpkgs[okp, 1L], lib, libs_only, > : > ERROR: failed to lock directory ‘C:\Program Files\R\R-2.13.1\library’ for > modifying > Try removing ‘C:\Program Files\R\R-2.13.1\library/00LOCK’ > > > Note the above is preceded by a long list of packages which are, in each case, re-loaded from whatever repo at a mirror being used. I have found the p_unlock() from package pacman useful. After assigning global variable P to the results of available.packages(), I repeatedly do: > > > p_unlock() > The following 00LOCK has been deleted: > C:/Program Files/R/R-2.13.1/library/00LOCK > > match(c("emoa"), P) > [1] 13 > > P<- P[13:length(P)] > > update.packages(method=NULL, ask=FALSE, checkBuilt=TRUE, type="win.binary", > > instPkgs=P, > + dependencies=c("Imports", "Depends", "Suggests"), > INSTALL_opts=c("--no-lock")) > where *emoa* is a stand-in for whatever package faulted during the load. (I also have no idea why *EMD* is locked in the above.) My *sessionInfo()* is: > > sessionInfo() > R version 3.6.2 (2019-12-12) > Platform: x86_64-w64-mingw32/x64 (64-bit) > Running under: Windows 7 x64 (build 7601) Service Pack 1 > > Matrix products: default > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 > LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > loaded via a namespace (and not attached): > [1] compiler_3.6.2 > > > Eventually, I get to the end of P and call it done. Anyone have a suggestion for an easier workaround? - Jan Galkowski [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Conversion of multivariate time series to functional time series
Hi, i am try to generalize the Functional autoregressive model of order one FAR(1) to FAR(p) through functional principle component by choosing a particular amount of variation, then using the functional scores of functional principle component for the prediction of vector autoregressive model i.e VAR(p) time series through VAR package, now i want to transform these prediction into functional form, this can be done through karhunen loeve transformation but how i could do this R. Can any body help me in this regard [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fw: How to read a file saved in Rstudio
Subject: How to read a result saved in Rstudio HI, i run the simulation result in other computer with high speed computer ,i save the result in the rda file. know i want to open this file rda file in my laptop, the file loaded in my laptop , i got the error like this load("C:/Users/Khan/Downloads/Poly.Slow.100.kappa08 (1).rda")> Poly.Slow.100.kappa08 (1).rdaError: unexpected symbol in "Poly.Slow.100.kappa08 (1).rda" can any one help me to resolve my issue. thanks alot in advance [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Durbin- Levinson algorithm
hi, i have a problem that how i could use Durbin- levinson algorithm for prediction in case of multiple time series? how i could do this in R... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Functional final prediction error
Hi, my question is related functional data analysis, what is functional final prediction error how we can use to transform vector autoregressive model to its functional form...need help in this regard i will be thanks ful.. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R 3.6.1 and apcluster package
I have confirmed that a complete workaround to these problems is available if, as Bill Dunlap suggested, "version=2" is used in all *save* incantations. Thanks Bill! - Jan On Thu, Jul 18, 2019, at 10:39, William Dunlap wrote: > Note that you can reproduce this in R-3.5.1 if you specify serialization > version 3 (which became the default in 3.6.0). > > > save(apresX, file="351-2.RData", version=2) > > save(apresX, file="351-2.RData", version=3) > Error: C stack usage 7969184 is too close to the limit > > version$version.string > [1] "R version 3.5.1 (2018-07-02)" > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Jul 18, 2019 at 12:46 AM Jan Galkowski > wrote: >> > # Test for saving. Jan Galkowski, 17th July 2019. >> > # produceProtectionFault.R >> > >> > library(apcluster) >> > cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) >> > cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) >> > x <- rbind(cl1, cl2) >> > >> > ## compute similarity matrix and run affinity propagation >> > ## (p defaults to median of similarity) >> > simil<- negDistMat(x, r=2) >> > apres <- apcluster(s=simil, details=TRUE) >> > apresX<- aggExCluster(s=simil, x=apres) >> > >> > show(apres) >> > show(apresX) >> > >> > saveRDS(object=apresX, file="foo.rds", compress=TRUE) >> > >> > #save(apresX, file="bar.data", compress=TRUE) >> > >> > #save.image("crazy.RData") >> >> The example is from the apcluster documentation. Leaving any one of the >> "save"s uncommented produces said fault. >> >> - Jan >> >> On Wed, Jul 17, 2019, at 08:18, Jeff Newmiller wrote: >> > It would never make sense for such messages to reflect normal and >> expected operation, so hypothesizing about intentionally changing stack >> behavior doesn't make sense. >> > >> > The default format for saveRDS changed in 3.6.0. There may be bugs >> associated with that, but rolling back to 3.6.0 would just trade bugs. >> > >> > https://cran.r-project.org/doc/manuals/r-devel/NEWS.html >> > >> > On July 16, 2019 8:56:28 PM CDT, Jan Galkowski >> wrote: >> > >Did something seriously change in R 3.6.1 at least for Windows in terms >> > >of stack impacts? >> > > >> > >I'm encountering many problems with the 00UNLOCK, needing to disable >> > >locking during installations. >> > > >> > >And I'm encountering >> > > >> > >> Error: C stack usage 63737888 is too close to the limit >> > > >> > >for cases I did not before, even when all I'm doing is serializing an >> > >object to be saved with *saveRDS* or even *save.image(.)*. >> > > >> > >Yes, I know, I did not append a minimally complete example. Just wanted >> > >to see if it was just me, or if anyone else was seeing this. >> > > >> > >It's on Windows 7 HE and I've run *R* here for years. >> > > >> > >My inclination is to drop back to 3.6.0 if it is just me or if no one >> > >knows about this problem. >> > > >> > >Thanks, >> > > >> > > - Jan Galkowski. >> [snip] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 3.6.1 and apcluster package
> # Test for saving. Jan Galkowski, 17th July 2019. > # produceProtectionFault.R > > library(apcluster) > cl1 <- cbind(rnorm(100, 0.2, 0.05), rnorm(100, 0.8, 0.06)) > cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05)) > x <- rbind(cl1, cl2) > > ## compute similarity matrix and run affinity propagation > ## (p defaults to median of similarity) > simil<- negDistMat(x, r=2) > apres <- apcluster(s=simil, details=TRUE) > apresX<- aggExCluster(s=simil, x=apres) > > show(apres) > show(apresX) > > saveRDS(object=apresX, file="foo.rds", compress=TRUE) > > #save(apresX, file="bar.data", compress=TRUE) > > #save.image("crazy.RData") The example is from the apcluster documentation. Leaving any one of the "save"s uncommented produces said fault. - Jan On Wed, Jul 17, 2019, at 08:18, Jeff Newmiller wrote: > It would never make sense for such messages to reflect normal and expected > operation, so hypothesizing about intentionally changing stack behavior > doesn't make sense. > > The default format for saveRDS changed in 3.6.0. There may be bugs associated > with that, but rolling back to 3.6.0 would just trade bugs. > > https://cran.r-project.org/doc/manuals/r-devel/NEWS.html > > On July 16, 2019 8:56:28 PM CDT, Jan Galkowski > wrote: > >Did something seriously change in R 3.6.1 at least for Windows in terms > >of stack impacts? > > > >I'm encountering many problems with the 00UNLOCK, needing to disable > >locking during installations. > > > >And I'm encountering > > > >> Error: C stack usage 63737888 is too close to the limit > > > >for cases I did not before, even when all I'm doing is serializing an > >object to be saved with *saveRDS* or even *save.image(.)*. > > > >Yes, I know, I did not append a minimally complete example. Just wanted > >to see if it was just me, or if anyone else was seeing this. > > > >It's on Windows 7 HE and I've run *R* here for years. > > > >My inclination is to drop back to 3.6.0 if it is just me or if no one > >knows about this problem. > > > >Thanks, > > > > - Jan Galkowski. > > > > > > [[alternative HTML version deleted]] > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > -- > Sent from my phone. Please excuse my brevity. > -- Jan Galkowski (o°) 607.239.1834 [mobile] 607.239.1834 [home] bayesianlogi...@gmail.com http://667-per-cm.net member, ... American Statistical Association ... International Society for Bayesian Analysis ... Ecological Society of America ... International Association of Survey Statisticians ... American Association for the Advancement of Science ... TeX Users Group (pronouns: *he, him, his*) *Keep your energy local*. --John Farrell, *ILSR <http://ilsr.org/>* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 3.6.1
Did something seriously change in R 3.6.1 at least for Windows in terms of stack impacts? I'm encountering many problems with the 00UNLOCK, needing to disable locking during installations. And I'm encountering > Error: C stack usage 63737888 is too close to the limit for cases I did not before, even when all I'm doing is serializing an object to be saved with *saveRDS* or even *save.image(.)*. Yes, I know, I did not append a minimally complete example. Just wanted to see if it was just me, or if anyone else was seeing this. It's on Windows 7 HE and I've run *R* here for years. My inclination is to drop back to 3.6.0 if it is just me or if no one knows about this problem. Thanks, - Jan Galkowski. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] functional autoregressive model of order more than one
HI, hope all of you will be fine. i am working in functional time series analysis. i fit the functional autoregressive model using FAR package in R, Know i want to generalize our model. How i can do this in R please help in this regard. i will be thankful [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] functional autoregressive model of order two
HI, i am functional data analysis, using times series data to which i fit the functional autoregressive model of order one using FAR package in R, know i want to extend my model to order two as it is observed that the FAR package can only do for order one., can any body help me that how to extend my model to order two or suggest me any package which help to solve my problem. I will be thanks full in this regard. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fitting functional autoregresive model
Hi, i trying to extend the functional autoregressive model one FAR(1) to fit the functional autoregressive model of order two FAR(2). the coding i do for far(1) is library(fda)library(far)# CREATE DUMMY VARIABLESfactor2dummy=function(x){ n=length(x) tab=as.factor(names(table(x))) p=length(tab) xdummy=matrix(0,n,p) for(i in 1:p) { xdummy[x==tab[i],i]=1 } colnames(xdummy)=tab return(xdummy)} # READ DATA demnd=read.csv("c:/Users/Khan/Desktop/dem99141.csv",header=TRUE)xdata <- as.matrix(demnd[-1,-1], ncol = 25, nrow =1826, byrow= TRUE)class(xdata)date=strptime(as.character(xdata[,1]),"%Y-%m-%d")weekday=weekdays(date)week=format(date,"%U")xdata=xdata[,-1]xdata# DAILY AVERAGExmean=apply(xdata,1,mean)xmean # SEASONAL ADJUSTMENT#seasadj=function(x) decompose(ts(x,frequency=7))$rand#xdata=apply(xdata,2,seasadj)#xdata=xdata[!is.na(xdata[,1]),]nrall=nrow(xdata)#wd=factor2dummy(weekday)#wnr=factor2dummy(week)#e=lm(xmean~wd+wnr-1)$residuals#tsdiag(arima(e,c(7,0,0)))#seasadj=function(x) lm(x~wd+wnr-1)$residuals#xdata=apply(xdata,2,seasadj) # HOLD-OUT-PERIODnout=100nin=nrall-noutxin=xdata[1:nin,] nr=nrow(xin)nc=ncol(xin)n=nr*ncy=matrix(t(xin),n,1)xfd=as.fdata(y,col=1,p=23,name="Cons") #p=23 is the multiple of length=39698 of data # ESTIMATE FAR(1) MODELk1=far.cv(xfd,y="Cons",kn=20,ncv=1000)$minL2[1]far1=far(xfd,kn=k1)far1# ESTIMATE AR(1)-MODELSp=14f=function(x) ar(x,aic=FALSE,order.max=p)ar.models=apply(xin,2,f) # FORECASTerrorsfar=matrix(0,nout,nc)errorsar=matrix(0,nout,nc)errorsnaive=matrix(0,nout,nc)predar=matrix(0,1,nc)prednaive=matrix(0,1,nc)for(i in 1:nout){ for(j in 1:length(ar.models)) { predar[1,j]=predict(ar.models[[j]],newdata=xdata[(nr+i-p):(nr+i-1),j])$pred } xnew=as.fdata(t(xdata[nr+i-1,]),col=1,p=23,name="Cons") pred=predict(far1,newdata=xnew) prednaive=xdata[nr+i-1,] obs=xdata[nr+i,] errorsnaive[i,]=t(obs-prednaive) errorsar[i,]=t(obs-predar) errorsfar[i,]=t(obs-pred$Cons)}msefar=apply(errorsfar^2,2,mean)msefarmsear=apply(errorsar^2,2,mean)msenaive=apply(errorsnaive^2,2,mean)mean(msear)mean(msenaive)mean(msefar i use the consumption data ... [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bifactor model and infit statistics?
Goodafternoon, I amcurrently in the process of calibrating an item bank using a GPCM model. So, amI right to assume that the bifactor model allows me to work with my generalfactor by assimilating it to a one-factor model, without taking into account groupfactors? That is, I can estimate my item parameters from my factor loadings onthe general factor only? If so, Ihave some questions about evaluating the fit of my model. The calculation of infitstatistics is specific to unidimensional models. Can I compute infit statisticsusing the general factor or do I have to do this separately for each of the groupfactors? Or is there a more appropriate method to evaluate the fit of my modelwhen calibration an item bank using a GPCM model? Thank youin advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] readxl::excel_sheets in tryCatch() doesn't catch error
Hi R programmers I am reading multiple .xls and .xlsx files from a directory using readxl from tidyverse. When reading fails, the code should continue on to the next file. However, when I call the custom function readExcelSheets (in a loop and with the tryCatch function) I get an error for some files and the code then stops executing. How can I force my code to continue on to the next files? Here is the function: readExcelSheets <- function(curPath) { out <- tryCatch( { message("This is the 'try' part") dat <- excel_sheets(curPath) }, error=function(cond) { message(paste("Error in opening Excel file with readxl read sheets:", curPath)) message("Here's the original error message:") message(cond) }, warning=function(cond) { message(paste("readxl caused a warning en reading sheets:", curPath)) message("Here's the original warning message:") message(cond) }, finally={ message(paste("Processed file for sheets:", curPath)) message("End of processing file for sheets.") } ) return(out) } The loop looks like this: listLength <- length(excelList) for (excel_file in excelList) { curPath <- excel_file sheetNames <- NULL sheetNames <- withTimeout({readExcelSheets(curPath)}, timeout = 5, onTimeout="silent") if(is.null(sheetNames)){next} for (sheetName in sheetNames){ # do something } } The problem is that I get an error: Error: Evaluation error: zip file '' cannot be opened. And then execution of the loop stops without progressing to the next Excel file. Note that for the first n=+-20 files the code works as expected. I think that there may be an error in the full path name (such as a text encoding error), but my point is that it should exit silently and progress to the next Excel file even if the path is not found. Best regards Phillip [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function in default parameter value closing over variables defined later in the enclosing function
Hi Duncan, On Wed, Jan 23, 2019 at 10:02:00AM -0500, Duncan Murdoch wrote: > On 23/01/2019 5:27 a.m., Jan T Kim wrote: > >Hi Ivan & All, > > > >R's scoping system basically goes to all environments along the call > >stack when trying to resolve an unbound variable, see the language > >definition [1], section 4.3.4, and perhaps also 2.1.5. > > You are misinterpreting that section. It's not the call stack that is > searched, it's the chain of environments that starts with the evaluation > frame of the current function. Those are very different. yes -- I meant the environment chain but mistakenly wrote "call stack", sorry. Thanks for pointing this out. Best regards, Jan > For example, > > > g <- function() { > print(secret) > } > > f <- function() { > secret <- "secret" > g() > } > > would fail, because even though secret is defined in the caller of g() and > is therefore in the call stack, that's irrelevant: it's not in g's > evaluation frame (which has no variables) or its parent (which is the global > environment if those definitions were evaluated there). > > Duncan Murdoch > > > > >Generally, unbound variables should be used with care. It's a bit > >difficult to decide whether and how the code should be rewritten, > >I'd say that depends on the underlying intentions / purposes. As it > >is, the code could be simplified to just > > > > print("secret"); > > > >but that's probably missing the point. > > > >Best regards, Jan > > > > > >[1] https://cran.r-project.org/doc/manuals/r-release/R-lang.html > > > >On Wed, Jan 23, 2019 at 12:53:01PM +0300, Ivan Krylov wrote: > >>Hi! > >> > >>I needed to generalize a loss function being optimized inside another > >>function, so I made it a function argument with a default value. It > >>worked without problems, but later I noticed that the inner function, > >>despite being defined in the function arguments, somehow closes over a > >>variable belonging to the outer function, which is defined later. > >> > >>Example: > >> > >>outside <- function(inside = function() print(secret)) { > >>secret <- 'secret' > >>inside() > >>} > >>outside() > >> > >>I'm used to languages that have both lambdas and variable declaration > >>(like perl5 -Mstrict or C++11), so I was a bit surprised. > >> > >>Does this work because R looks up the variable by name late enough at > >>runtime for the `secret` variable to exist in the parent environment of > >>the `inside` function? Can I rely on it? Is this considered bad style? > >>Should I rewrite it (and how)? > >> > >>-- > >>Best regards, > >>Ivan > >> > >>__ > >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>https://stat.ethz.ch/mailman/listinfo/r-help > >>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >>and provide commented, minimal, self-contained, reproducible code. > > > >__ > >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >https://stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, reproducible code. > > > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Function in default parameter value closing over variables defined later in the enclosing function
Hi Ivan & All, R's scoping system basically goes to all environments along the call stack when trying to resolve an unbound variable, see the language definition [1], section 4.3.4, and perhaps also 2.1.5. Generally, unbound variables should be used with care. It's a bit difficult to decide whether and how the code should be rewritten, I'd say that depends on the underlying intentions / purposes. As it is, the code could be simplified to just print("secret"); but that's probably missing the point. Best regards, Jan [1] https://cran.r-project.org/doc/manuals/r-release/R-lang.html On Wed, Jan 23, 2019 at 12:53:01PM +0300, Ivan Krylov wrote: > Hi! > > I needed to generalize a loss function being optimized inside another > function, so I made it a function argument with a default value. It > worked without problems, but later I noticed that the inner function, > despite being defined in the function arguments, somehow closes over a > variable belonging to the outer function, which is defined later. > > Example: > > outside <- function(inside = function() print(secret)) { > secret <- 'secret' > inside() > } > outside() > > I'm used to languages that have both lambdas and variable declaration > (like perl5 -Mstrict or C++11), so I was a bit surprised. > > Does this work because R looks up the variable by name late enough at > runtime for the `secret` variable to exist in the parent environment of > the `inside` function? Can I rely on it? Is this considered bad style? > Should I rewrite it (and how)? > > -- > Best regards, > Ivan > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Random projection
Ms Fleming, This blog post (of mine) may be of interest and hopefully of help: https://667-per-cm.net/2018/11/20/the-johnson-lindenstrauss-lemma-and-the-paradoxical-power-of-random-linear-operators-part-1/ Cheers! -- Jan Galkowski (o°) Westwood, MA (pronouns: *he, him, his*) *Keep your energy local*. --John Farrell, *ILSR[1]* Links: 1. http://ilsr.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] saveRDS() and readRDS() Why?
Are you sure you didn't do saveRDS("rawData", file = "rawData.rds") instead of saveRDS(rawData, file = "rawData.rds") ? This would explain the result you have under linux. In principle saveRDS and readRDS can be used to copy objects between R-sessions without loosing information. What does readRDS return on windows with the same file? What type of object is rawData? Do str(rawData). Some objects created by packages cannot be serialized, e.g. objects that point to memory allocated by a package. The pointer is then serialized not the memory pointed to. Also, if the object is generated by a package, you might need to load the package to get the printing etc. of the object right. HTH, Jan On 07-11-18 09:45, Patrick Connolly wrote: On Wed, 07-Nov-2018 at 08:27AM +, Robert David Burbidge wrote: |> Hi Patrick, |> |> From the help: "save writes a single line header (typically |> "RDXs\n") before the serialization of a single object". |> |> If the file sizes are the same (see Eric's message), then the |> problem may be due to different line terminators. Try serialize and |> unserialize for low-level control of saving/reading objects. I'll have to find out what 'serialize' means. On Windows, it's a huge table, looks like it's all hexadecimal. On Linux, it's just the text string 'rawData' -- a lot more than line terminators. Have I misunderstood what the idea is? I thought I'd get an identical object, irrespective of how different the OS stores and zips it. |> |> Rgds, |> |> Robert |> |> |> On 07/11/18 08:13, Eric Berger wrote: |> >What do you see at the OS level? |> >i.e. on windows |> >DIR rawData.rds |> >on linux |> >ls -l rawData.rds |> >compare the file sizes on both. |> > |> > |> >On Wed, Nov 7, 2018 at 9:56 AM Patrick Connolly |> >wrote: |> > |> >> From a Windows R session, I do |> >> |> >>>object.size(rawData) |> >>31736 bytes # from scraping a non-reproducible web address. |> >>>saveRDS(rawData, file = "rawData.rds") |> >>Then copy to a Linux session |> >> |> >>>rawData <- readRDS(file = "rawData.rds") |> >>>rawData |> >>[1] "rawData" |> >>>object.size(rawData) |> >>112 bytes |> >>>rawData |> >>[1] "rawData" # only the name and something to make up 112 bytes |> >>Have I misunderstood the syntax? |> >> |> >>It's an old version on Windows. I haven't used Windows R since then. |> >> |> >>major 3 |> >>minor 2.4 |> >>year 2016 |> >>month 03 |> >>day16 |> >> |> >> |> >>I've tried R-3.5.0 and R-3.5.1 Linux versions. |> >> |> >>In case it's material ... |> >> |> >>I couldn't get the scraping to work on either of the R installations |> >>but Windows users told me it worked for them. So I thought I'd get |> >>the R object and use it. I could understand accessing the web address |> >>could have different permissions for different OSes, but should that |> >>affect the R objects? |> >> |> >>TIA |> >> |> >>-- |> >>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. |> >>___Patrick Connolly |> >> {~._.~} Great minds discuss ideas |> >> _( Y )_ Average minds discuss events |> >>(:_~*~_:) Small minds discuss people |> >> (_)-(_) . Eleanor Roosevelt |> >> |> >>~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. |> >> |> >>__ |> >>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see |> >>https://stat.ethz.ch/mailman/listinfo/r-help |> >>PLEASE do read the posting guide |> >>http://www.R-project.org/posting-guide.html |> >>and provide commented, minimal, self-contained, reproducible code. |> >> |> > [[alternative HTML version deleted]] |> > |> >__ |> >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see |> >https://stat.ethz.ch/mailman/listinfo/r-help |> >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html |> >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Plot a path
Below a similar example, using sf and leaflet; plotting the trajectory on a background map. library(leaflet) library(sf) library(dplyr) # Generate example data gen_data <- function(id, n) { data.frame( id = id, date = 1:n, lat = runif(10, min = -90, max = 90), lon = runif(10, min = -180, max = 180) ) } dta <- lapply(1:2, gen_data, n = 10) %>% bind_rows() # Transform all records of one object/person to a st_linestring, then # combine into one sf column lines <- dta %>% arrange(id, date) %>% split(dta$id) %>% lapply(function(d) st_linestring(cbind(d$lon, d$lat))) %>% unname() %>% # Without the unname it doesn't work for some reason st_sfc() # Plot using leaflet leaflet() %>% addTiles() %>% addPolylines(data = lines) HTH - Jan On 01-11-18 11:27, Rui Barradas wrote: Hello, The following uses ggplot2. First, make up a dataset, since you have not posted one. lat0 <- 38.736946 lon0 <- -9.142685 n <- 10 set.seed(1) Date <- seq(Sys.Date() - n + 1, Sys.Date(), by = "days") Lat <- lat0 + cumsum(c(0, runif(n - 1))) Lon <- lon0 + cumsum(c(0, runif(n - 1))) Placename <- rep(c("A", "B"), n/2) path <- data.frame(Date, Placename, Lat, Lon) path <- path[order(path$Date), ] Now, two graphs, one with just one line of all the lon/lat and the other with a line for each Placename. library(ggplot2) ggplot(path, aes(x = Lon, y = Lat)) + geom_point() + geom_line() ggplot(path, aes(x = Lon, y = Lat, colour = Placename)) + geom_point(aes(fill = Placename)) + geom_line() Hope this helps, Rui Barradas Às 21:27 de 31/10/2018, Ferri Leberl escreveu: Dear All, I have a dataframe with four cols: Date, Placename, geogr. latitude, geogr. longitude. How can I plot the path as a line, ordered by the date, with the longitude as the x-axis and the latitude as the y-axis? Thank you in advance! Yours, Ferri __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating just a single row of dissimilarity/distance matrix
Please respond to the list; there are more people answering there. As explained in the documentation gower_dist performes a pairwise comparison of the two arguments recycling the shortest one if needed, so indeed gower_dist(iris[1:5, ], iris) doesn't do what you want. Possible solutions are: tmp <- split(iris[1:150, ], seq_len(150)) sapply(gower_dist, iris) and: library(dplyr) library(tidyr) pairs <- expand.grid(x = 1:5, y = 1:nrow(iris)) pairs$dist <- gower_dist(iris[pairs$x, ], iris[pairs$y, ]) pairs %>% spread(y, dist) Don't know which one is faster. And there are probably various other solutions too. -- Jan On 27-10-18 18:04, Aerenbkts bkts wrote: Dear Jan Thanks for your help. Actually it works for the first element. But I tried to calculate distance values for the first N rows. For example; gower_dist(iris[1:5,], iris) // gower distance for the first 5 rows. but it did not work. Do you have any suggestion about it? On Fri, 26 Oct 2018 at 21:31, Jan van der Laan <mailto:rh...@eoos.dds.nl>> wrote: Using another implementation of the gower distance: library(gower) gower_dist(iris[1,], iris) HTH, Jan On 26-10-18 15:07, Aerenbkts bkts wrote: > I have a data-frame with 30k rows and 10 features. I would like to > calculate distance matrix like below; > > gower_dist <- daisy(data-frame, metric = "gower"), > > > This function returns whole dissimilarity matrix. I want to get just > the first row. > (Just distances of the first element in data-frame). How can I do it? > Do you have an idea? > > > Regards > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Calculating just a single row of dissimilarity/distance matrix
Using another implementation of the gower distance: library(gower) gower_dist(iris[1,], iris) HTH, Jan On 26-10-18 15:07, Aerenbkts bkts wrote: I have a data-frame with 30k rows and 10 features. I would like to calculate distance matrix like below; gower_dist <- daisy(data-frame, metric = "gower"), This function returns whole dissimilarity matrix. I want to get just the first row. (Just distances of the first element in data-frame). How can I do it? Do you have an idea? Regards [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug : Autocorrelation in sample drawn from stats::rnorm (hmh)
On 05/10/2018, 09:45, "R-help on behalf of hmh" wrote: Hi, Thanks William for this fast answer, and sorry for sending the 1st mail to r-help instead to r-devel. I noticed that bug while I was simulating many small random walks using c(0,cumsum(rnorm(10))). Then the negative auto-correlation was inducing a muchsmaller space visited by the random walks than expected if there would be no auto-correlation in the samples. The code I provided and you optimized was only provided to illustrated and investigate that bug. It is really worrying that most of the R distributions are affected by this bug What I did should have been one of the first check done for _*each*_ distributions by the developers of these functions ! And if as you suggested this is a "tolerated" _error_ of the algorithm, I do think this is a bad choice, but any way, this should have been mentioned in the documentations of the functions !! cheers, hugo This is not a bug. You have simply rediscovered the finite-sample bias in the sample autocorrelation coefficient, known at least since Kendall, M. G. (1954). Note on bias in the estimation of autocorrelation. Biometrika, 41(3-4), 403-404. The bias is approximately -1/T, with T sample size, which explains why it seems to disappear in the larger sample sizes you consider. Jan __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Bug : Autocorrelation in sample drawn from stats::rnorm
Did you take into account that the sample serial correlation coefficient has a bias of approximately -1/T (with T the sample size)? Its variance is approximately 1/T. Jan Annaert -Original Message- From: R-help On Behalf Of hmh Sent: donderdag 4 oktober 2018 12:09 To: R Subject: [R] Bug : Autocorrelation in sample drawn from stats::rnorm Hi, I just noticed the following bug: When we draw a random sample using the function stats::rnorm, there should be not auto-correlation in the sample. But their is some auto-correlation _when the sample that is drawn is small_. I describe the problem using two functions: DistributionAutocorrelation_Unexpected which as the wrong behavior : _when drawing some small samples using rnorm, there is generally a strong negative auto-correlation in the sample_. and DistributionAutocorrelation_Expected which illustrate the expected behavior *Unexpected : * DistributionAutocorrelation_Unexpected = function(SampleSize){ Cor = NULL for(repetition in 1:1e5){ X = rnorm(SampleSize) Cor[repetition] = cor(X[-1],X[-length(X)]) } return(Cor) } par(mfrow=c(3,3)) for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){ hist(DistributionAutocorrelation_Unexpected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_)) ; abline(v=0,col=2) } output: *Expected**:* DistributionAutocorrelation_Expected = function(SampleSize){ Cor = NULL for(repetition in 1:1e5){ X = rnorm(SampleSize) * Cor[repetition] = cor(sample(X[-1]),sample(X[-length(X)]))* } return(Cor) } par(mfrow=c(3,3)) for(SampleSize_ in c(4,5,6,7,8,10,15,20,50)){ hist(DistributionAutocorrelation_Expected(SampleSize_),col='grey',main=paste0('SampleSize=',SampleSize_)) ; abline(v=0,col=2) } Some more information you might need: packageDescription("stats") Package: stats Version: 3.5.1 Priority: base Title: The R Stats Package Author: R Core Team and contributors worldwide Maintainer: R Core Team Description: R statistical functions. License: Part of R 3.5.1 Imports: utils, grDevices, graphics Suggests: MASS, Matrix, SuppDists, methods, stats4 NeedsCompilation: yes Built: R 3.5.1; x86_64-pc-linux-gnu; 2018-07-03 02:12:37 UTC; unix Thanks for correcting that. fill free to ask any further information you would need. cheers, hugo -- - no title specified Hugo Mathé-Hubert ATER Laboratoire Interdisciplinaire des Environnements Continentaux (LIEC) UMR 7360 CNRS - Bât IBISE Université de Lorraine - UFR SciFA 8, Rue du Général Delestraint F-57070 METZ +33(0)9 77 21 66 66 - - - - - - - - - - - - - - - - - - Les réflexions naissent dans les doutes et meurent dans les certitudes. Les doutes sont donc un signe de force et les certitudes un signe de faiblesse. La plupart des gens sont pourtant certains du contraire. - - - - - - - - - - - - - - - - - - Thoughts appear from doubts and die in convictions. Therefore, doubts are an indication of strength and convictions an indication of weakness. Yet, most people believe the opposite. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Erase content of dataframe in a single stroke
Or testdf <- testdf[FALSE, ] or testdf <- testdf[numeric(0), ] which seems to be slightly faster. Best, Jan Op 27-9-2018 om 10:32 schreef PIKAL Petr: Hm I would use testdf<-data.frame(A=c(1,2),B=c(2,3),C=c(3,4)) str(testdf) 'data.frame': 2 obs. of 3 variables: $ A: num 1 2 $ B: num 2 3 $ C: num 3 4 testdf<-testdf[-(1:nrow(testdf)),] str(testdf) 'data.frame': 0 obs. of 3 variables: $ A: num $ B: num $ C: num Cheers Petr -Original Message- From: R-help On Behalf Of Jim Lemon Sent: Thursday, September 27, 2018 10:12 AM To: Luigi Marongiu ; r-help mailing list Subject: Re: [R] Erase content of dataframe in a single stroke Ah, yes, try 'as.data.frame" on it. Jim On Thu, Sep 27, 2018 at 6:00 PM Luigi Marongiu wrote: Thank you Jim, this requires the definition of an ad hoc function; strange that R does not have a function for this purpose... Anyway, it works but it changes the structure of the data. By redefining the dataframe as I did, I obtain: df [1] A B C <0 rows> (or 0-length row.names) str(df) 'data.frame': 0 obs. of 3 variables: $ A: num $ B: num $ C: num When applying your function, I get: df $A NULL $B NULL $C NULL str(df) List of 3 $ A: NULL $ B: NULL $ C: NULL The dataframe has become a list. Would that affect downstream applications? Thank you, Luigi On Thu, Sep 27, 2018 at 9:45 AM Jim Lemon wrote: Hi Luigi, Maybe this: testdf<-data.frame(A=1,B=2,C=3) testdf A B C 1 1 2 3 toNull<-function(x) return(NULL) testdf<-sapply(testdf,toNull) Jim On Thu, Sep 27, 2018 at 5:29 PM Luigi Marongiu wrote: Dear all, I would like to erase the content of a dataframe -- but not the dataframe itself -- in a simple and fast way. At the moment I do that by re-defining the dataframe itself in this way: df <- data.frame(A = numeric(), + B = numeric(), + C = character()) # assign A <- 5 B <- 0.6 C <- 103 # load R <- cbind(A, B, C) df <- rbind(df, R) df A B C 1 5 0.6 103 # erase df <- data.frame(A = numeric(), + B = numeric(), + C = character()) df [1] A B C <0 rows> (or 0-length row.names) Is there a way to erase the content of the dataframe in a simplier (acting on all the dataframe at once instead of naming each column individually) and nicer (with a specific erasure command instead of re-defyining the object itself) way? Thank you. -- Best regards, Luigi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Best regards, Luigi __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading data problem
hmm... I don't see the quote="" paraneter in your read.csv call Best regards, Jan -- Sent from my mobile. Apologies for typos and terseness On Mon, Sep 24, 2018, 20:40 greg holly wrote: > Hi Jan; > > Thanks so much for this. Yes, I did. Her is my code to read > data: a<-read.csv("for_R_graphs.csv", header=T, sep=",") > > On Mon, Sep 24, 2018 at 2:07 PM Jan T Kim via R-help > wrote: > >> Yet one more: have you tried adding quote="" to your read.table >> parameters? Quote characters have a 50% chance of being balanced, >> and they can encompass multiple lines... >> >> On Mon, Sep 24, 2018 at 11:40:47AM -0700, Bert Gunter wrote: >> > One more question: >> > >> > 5. Have you tried shutting down, restarting R, and rereading? >> > >> > -- Bert >> > >> > On Mon, Sep 24, 2018 at 11:36 AM Bert Gunter >> wrote: >> > >> > > *Perhaps* useful questions (perhaps *not*, though): >> > > >> > > 1. What is your OS? What is your R version? >> > > 2. How do you know that your data has 151 rows? >> > > 3. Are there stray characters -- perhaps a stray eof -- in your data? >> Have >> > > you checked around row 96 to see what's there? >> > > 4. Are the data you did get in R what you expect? >> > > >> > > -- Bert >> > > >> > > Bert Gunter >> > > >> > > "The trouble with having an open mind is that people keep coming >> along and >> > > sticking things into it." >> > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >> > > >> > > >> > > On Mon, Sep 24, 2018 at 11:27 AM greg holly >> wrote: >> > > >> > >> Hi Dear all; >> > >> >> > >> I have a dataset with 151*291 dimension. After making data read into >> R I >> > >> am >> > >> getting a data with 96*291 dimension. Even though I have no error >> message >> > >> from R I could not understand the reason why I cannot get data >> correctly? >> > >> >> > >> Here are my codes to make read the data >> > >> a<-read.table("for_R_graphs.csv", header=T, sep=",") >> > >> a<-read.table("for_R_graphs.txt", header=T, sep="\t") >> > >> >> > >> Regards, >> > >> >> > >> Greg >> > >> >> > >> [[alternative HTML version deleted]] >> > >> >> > >> __ >> > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > >> https://stat.ethz.ch/mailman/listinfo/r-help >> > >> PLEASE do read the posting guide >> > >> http://www.R-project.org/posting-guide.html >> > >> and provide commented, minimal, self-contained, reproducible code. >> > >> >> > > >> > >> > [[alternative HTML version deleted]] >> > >> > __ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading data problem
Yet one more: have you tried adding quote="" to your read.table parameters? Quote characters have a 50% chance of being balanced, and they can encompass multiple lines... On Mon, Sep 24, 2018 at 11:40:47AM -0700, Bert Gunter wrote: > One more question: > > 5. Have you tried shutting down, restarting R, and rereading? > > -- Bert > > On Mon, Sep 24, 2018 at 11:36 AM Bert Gunter wrote: > > > *Perhaps* useful questions (perhaps *not*, though): > > > > 1. What is your OS? What is your R version? > > 2. How do you know that your data has 151 rows? > > 3. Are there stray characters -- perhaps a stray eof -- in your data? Have > > you checked around row 96 to see what's there? > > 4. Are the data you did get in R what you expect? > > > > -- Bert > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along and > > sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Mon, Sep 24, 2018 at 11:27 AM greg holly wrote: > > > >> Hi Dear all; > >> > >> I have a dataset with 151*291 dimension. After making data read into R I > >> am > >> getting a data with 96*291 dimension. Even though I have no error message > >> from R I could not understand the reason why I cannot get data correctly? > >> > >> Here are my codes to make read the data > >> a<-read.table("for_R_graphs.csv", header=T, sep=",") > >> a<-read.table("for_R_graphs.txt", header=T, sep="\t") > >> > >> Regards, > >> > >> Greg > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R shared library (/usr/lib64/R/lib/libR.so) not found.
Hi Rolf & All, I haven't built R in a while, but my general expectation of an autotools based build & install would be that the default prefix is /usr/local, rather than /usr. So I'd expect the shared libs in /usr/local/lib, /usr/local/lib64 etc. I also have a recollection that I once installed Rstudio for some MOOC, and ended up putting a symlink in somewhere like /usr/lib* , because Rstudio was only available as a binary with the location of the shared lib hard-baked into it. Depending on your this may be irrelevant, apologies in that case. Best regards, Jan On Thu, Aug 23, 2018 at 10:57:35PM +1200, Rolf Turner wrote: > > I *think* that this is an R question (and *not* an RStudio question!) > > I have, somewhat against my better judgement, decided to experiment with > using RStudio. > > I downloaded and install RStudio. Easy-peasy. Nice lucid instructions. > > Then I tried to start RStudio ("rstudio" from the command line) > and got a pop-up window with the error message: > > >R shared library (/usr/lib64/R/lib/libR.so) not found. If this > >is a custom build of R, was it built with the --enable-R-shlib option? > > Oops, no, I guess it wasn't. So I carefully did a > > sudo make uninstall > make clean > make distclean > > and then did > > ./R-3.5.1/configure > > making sure I added the --enable-R-shlib flag. > > Then I did make and sudo make install. It all seemed to go ... > but then I did > > rstudio > > again and got the same popup error. > > There is indeed *no* libR.so in /usr/lib64/R/lib. > > There *is* a libR.so in /usr/lib/R/lib, but (weirdly) ls -l reveals that it > dates from the my previous install of R-3.5.1 for which I *did not* > configure with --enable-R-shlib. > > Can anyone explain to me WTF is going on? > > What should I do? Just make a symbolic link from /usr/lib/R/lib/libR.so to > /usr/lib64/R/lib/libR.so? > > It bothers me that /usr/lib/R/lib/libR.so was not "refreshed" from my > most recent install of R. > > I plead for enlightenment. > > cheers, > > Rolf Turner > > P.S. I'm running Ubuntu 18.04. And the previous install of R was done under > Ubuntu 18.04. > > R. T. > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] security using R at work
You can also inadvertently transmit data to the internet using a package without being obviously 'stupid', e.g. by using a package that uses an external service for data processing. For example, some javascript visualisation libs can do that (not sure if those wrapped in R-packages do), or, for example, a geocoding service. Not having an (outgoing) internet connection at least helps against mistakes like this (and probably against many untargeted attacks). If it is allowed to have the sensitive data on that computer, using R on that computer is probably not going to make is less safe. Jan On 09-08-18 09:19, Rainer M Krug wrote: I can not agree more, Barry. Very nicely put. Rainer On 8 Aug 2018, at 18:10, Barry Rowlingson wrote: On Wed, Aug 8, 2018 at 4:09 PM, Laurence Clark wrote: Hello all, I want to download R and use it for work purposes. I hope to use it to analyse very sensitive data from our clients. My question is: If I install R on my work network computer, will the data ever leave our network? I need to know if the data goes anywhere other than our network, because this could compromise it's security. Is there is any chance the data could go to a server owned by 'R' or anything else that's not immediately obvious, but constitutes the data leaving our network? You are talking mostly to statisticians here, and if p>0 then there's "a chance". I'd say yes, there's a chance, but its pretty small, and would only occur through stupidity, accident or malice. In the ordinary course of things your data will be on your hard disk, or on your corporate network drives, and only exist between your corporate network server and your PC's memory. R will load the data into that memory, do stuff with it in that memory, and write results back to hard disk. Nothing leaves the network this way. However... R has facilities for talking to the internet. You can save data to google docs spreadsheets, for example, but you'd have to be signed in to google, and have to type something like: writeGoogleDoc(my_data, "secretdata.xls") that covers "stupid". You should know that google docs are on google's servers, and google's servers aren't on your network, and your secret data shouldn't go on google's servers. Accidents happen. You might be working on non-secret data which you want to save to google docs, and accidentally save "data1" which is secret instead of "data2" which is okay to be public. Oops. You sent it to google. Accidents happen. "malice" would be if someone had put code into R or an add-on package that you use that sends your data over the network without you knowing. For example maybe every time you fit a linear model with: lm(age~beauty, data=people) R could be transmitting the data to hackers. But the chance of this is very small, and I don't think any malicious code has ever been discovered in R or the 12000 add-on packages downloadable from CRAN. Doesn't mean it hasn't been discovered yet or won't be in the future. It used to be said that the only machine safe from hackers was one unplugged from the network. But now hackers can get to your machine via malicious USB sticks, keyboard loggers, and various other nasties. The only machine safe from hackers is one with the power off. But take the power plug out because a wake-on-lan packet could switch your machine on remotely Barry Thank you Laurence -- Laurence Clark Business Data Analyst Account Management Health Management Ltd Mobile: 07584 556498 Switchboard:0845 504 1000 Email: laurence.cl...@healthmanltd.com Web:www.healthmanagement.co.uk -- CONFIDENTIALITY NOTICE: This email, including attachments, is for the sole use of the intended recipients and may contain confidential and privileged information or otherwise be protected by law. Any unauthorised review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender, and destroy all copies and the original message.MAXIMUS People Services Limited is registered in England and Wales (registered number: 03752300); registered office: 202 - 206 Union Street, London, SE1 0LX, United Kingdom. The Centre for Health and Disability Assessments Ltd (registered number: 9072343) and Health Management Ltd (registered number: 4369949) are registered in England and Wales. The registered office for each is Ash House, The Broyle, Ringmer, East Sussex, BN8 5NN, United Kingdom. Remploy Limited is registered in England and Wales (registered number: 09457025)
Re: [R] F-test where the coefficients in the H_0 is nonzero
You can easily test linear restrictions using the function linearHypothesis() from the car package. There are several ways to set up the null hypothesis, but a straightforward one here is: > library(car) > x <- rnorm(10) > y <- x+rnorm(10) > linearHypothesis(lm(y~x), c("(Intercept)=0", "x=1")) Linear hypothesis test Hypothesis: (Intercept) = 0 x = 1 Model 1: restricted model Model 2: y ~ x Res.Df RSS Df Sum of Sq F Pr(>F) 1 10 10.6218 2 8 9.0001 21.6217 0.7207 0.5155 Jan From: R-help on behalf of John Date: Thursday, 2 August 2018 at 10:44 To: r-help Subject: [R] F-test where the coefficients in the H_0 is nonzero Hi, I try to run the regression y = beta_0 + beta_1 x and test H_0: (beta_0, beta_1) =(0,1) against H_1: H_0 is false I believe I can run the regression (y-x) = beta_0 +beta_1‘ x and do the regular F-test (using lm functio) where the hypothesized coefficients are all zero. Is there any function in R that deal with the case where the coefficients are nonzero? John [[alternative HTML version deleted]] __ mailto:R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (no subject)
Hi, thanks a lot! Now it works. Yours Jan Am 10.07.2018 um 09:00 schrieb PIKAL Petr mailto:petr.pi...@precheza.cz>>: Hi see in line -Original Message- From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Werning, Jan- Philipp Sent: Monday, July 9, 2018 9:42 PM To: r-help@r-project.org<mailto:r-help@r-project.org> Subject: [R] (no subject) Dear all, In the end I try to run a system dynamics simulation in R using the package deSolve. Therefore I need an auxiliary list (auxs) the model can refer to when it the functions need an auxiliary value. I used a manual list: auxs <- c( aSplitSN=0.4 , aSplitLN=0.6, aSplitSR1=0 , aSplitLR1=1, aSplitSR2=0 , aSplitLR2=1, aSplitSR3=0 , aSplitLR3=1, aSalesNR=0.92, aSalesRR=0.08, […]) This is vector not list. auxs <- c( aSplitSN=0.4 , aSplitLN=0.6, aSplitSR1=0 , aSplitLR1=1, aSplitSR2=0) is.vector(auxs) [1] TRUE is.list(auxs) [1] FALSE this way everything worked well. Now I want to use a matrix with different values for each of the auxiliaries in order to run different scenarios. Therefore I created a csv document wich I read in: csv1 <- read.csv("180713_Taguchi Robust Design Test_180709_1745.csv", sep = ";") list_csv <- csv1[1,] which is probably data frame test<-vec[1,] is.vector(test) [1] FALSE is.list(test) [1] TRUE is.data.frame(test) [1] TRUE namesauxs <- names(list_csv) auxs1 <- as.numeric(list_csv) names(auxs1) <- namesauxs auxs <- auxs1 Looking at the global environment section in R studio, now both are the same, in the value section as "Numed num" I do not know rstudio but you could check two objects by ?identical Yet, the model will not run using these values ultimately coming from the csv. I wonder why do you use as.numeric in the first instance. You coud use auxs1 <- unlist(csv1[1,]) and you should get named numeric vector. Maybe there are problems when reading numbers from csv file. You could check it e.g. by str(auxs1) What am I doing wrong here? It would be great if you could help. Thanks a lot in advance Yours Jan [[alternative HTML version deleted]] __ R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/ Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (no subject)
Dear all, In the end I try to run a system dynamics simulation in R using the package deSolve. Therefore I need an auxiliary list (auxs) the model can refer to when it the functions need an auxiliary value. I used a manual list: auxs <- c( aSplitSN=0.4 , aSplitLN=0.6, aSplitSR1=0 , aSplitLR1=1, aSplitSR2=0 , aSplitLR2=1, aSplitSR3=0 , aSplitLR3=1, aSalesNR=0.92, aSalesRR=0.08, […]) this way everything worked well. Now I want to use a matrix with different values for each of the auxiliaries in order to run different scenarios. Therefore I created a csv document wich I read in: csv1 <- read.csv("180713_Taguchi Robust Design Test_180709_1745.csv", sep = ";") list_csv <- csv1[1,] namesauxs <- names(list_csv) auxs1 <- as.numeric(list_csv) names(auxs1) <- namesauxs auxs <- auxs1 Looking at the global environment section in R studio, now both are the same, in the value section as "Numed num" Yet, the model will not run using these values ultimately coming from the csv. What am I doing wrong here? It would be great if you could help. Thanks a lot in advance Yours Jan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] Natural Language Processing for non-English languages with udpipe
Dear R users, I'm happy to announce the release of version 0.3 of the udpipe R package on CRAN (https://CRAN.R-project.org/package=udpipe). The udpipe R package is a Natural Language Processing toolkit that provides language-agnostic 'tokenization', 'parts of speech tagging', 'lemmatization', 'morphological feature tagging' and 'dependency parsing' of raw text. Next to text parsing, the R package also allows you to train annotation models based on data of 'treebanks' in 'CoNLL-U' format as provided at http://universaldependencies.org/format.html. The R package provides direct access to language models trained on more than 50 languages. The following languages are directly available: afrikaans, ancient_greek-proiel, ancient_greek, arabic, basque, belarusian, bulgarian, catalan, chinese, coptic, croatian, czech-cac, czech-cltt, czech, danish, dutch-lassysmall, dutch, english-lines, english-partut, english, estonian, finnish-ftb, finnish, french-partut, french-sequoia, french, galician-treegal, galician, german, gothic, greek, hebrew, hindi, hungarian, indonesian, irish, italian, japanese, kazakh, korean, latin-ittb, latin-proiel, latin, latvian, lithuanian, norwegian-bokmaal, norwegian-nynorsk, old_church_slavonic, persian, polish, portuguese-br, portuguese, romanian, russian-syntagrus, russian, sanskrit, serbian, slovak, slovenian-sst, slovenian, spanish-ancora, spanish, swedish-lines, swedish, tamil, turkish, ukrainian, urdu, uyghur, vietnamese We hope that the package will allow other R users to build natural language applications on top of the resulting parts of speech tags, tokens, morphological features and dependency parsing output. And we hope in particular that applications will arise which are not limited to English only (like the textrank R package or the cleanNLP package to name a few) Note that the package has no external software dependencies (no java nor python) and depends only on 2 R packages (Rcpp and data.table), which makes the package easy to install on any platform. The package is available on CRAN at https://CRAN.R-project.org/package=udpipe and is developed at https://github.com/bnosac/udpipe A small docusaurus website is made available at https://bnosac.github.io/udpipe/en We hope you enjoy using it and we would like to thank Milan Straka for all the efforts done on UDPipe as well as all persons involved in http://universaldependencies.org all the best, Jan Jan Wijffels Statistician www.bnosac.be | +32 486 611708 [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Numerical stability in chisq.test
The chisq.test contains following code: STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE)) However, based on book Accuracy and stability of numerical algorithms <http://ftp.demec.ufpr.br/CFD/bibliografia/Higham_2002_Accuracy%20and%20Stability%20of%20Numerical%20Algorithms.pdf> Table 4.1 on page 89, it is better to sort the data in increasing order than in decreasing order, when the data are non-negative. A demonstrative example: x = matrix(c(rep(1.1, 1)), 10^16, nrow = 10001, ncol = 1)# We have a vector with 1*1.1 and 1*10^16 c(sum(sort(x, decreasing = TRUE)), sum(sort(x, decreasing = FALSE))) The result: 100010996 100011000 When we sort the data in the increasing order, we get the correct result. If we sort the data in the decreasing order, we get a result that is off by 4. Shouldn't the sort be in the increasing order rather than in the decreasing order? Best regards, Jan Motl PS: This post is based on discussion on https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation <https://stackoverflow.com/questions/47847295/why-does-chisq-test-sort-data-in-descending-order-before-summation>. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] release of version 0.2 of the textrank package
Hello R users, I'm pleased to announce the release of version 0.2 of the textrank package on CRAN: https://CRAN.R-project.org/package=textrank *The package is a natural language processing package which allows one to summarize text by finding* *- relevant sentences* *- relevant keywords* This is done by constructing a sentence network which finds how sentences are related to one another (word overlap). On that network Google Pagerank is used in order to find relevant sentences. In a similar way 'textrank' can also be used to extract keywords. How? A word network is constructed by looking if words are following one another. On top of that network the 'Pagerank' algorithm is applied to extract relevant words. Relevant words which are following one another are next pasted together to get keywords. The package has a vignette at https://cran.r-project.org/web/packages/textrank/vignettes/textrank.html and it also plays nicely with the udpipe package https://CRAN.R-project.org/package=udpipe which is good for parts-of-speech tagging, lemmatisation, dependency parsing and general NLP processing. all the best, Jan Jan Wijffels Statistician www.bnosac.be [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] DeSolve Package and Moving Average
Dear all, I am using the DeSolve Package to simulate a system dynamics model. At the problematic point in the model, I basically want to decide how many products shall be produced to be sold. In order to determine the amount a basic forecasting model of using the average of the last 12 time periods shall be used. My code looks like the following. “ […] # Time units in month START<-0; FINISH<-120; STEP<-1 # Set seed for reproducability set.seed(123) # Create time vector simtime <- seq(START, FINISH, by=STEP) # Create a stock vector with initial values stocks <- c([…]) # Create an aux vector for the fixed aux values auxs<- c([…]) model <- function(time, stocks, auxs){ with(as.list(c(stocks, auxs)),{ [… “lots of aux, flow, and stock functions” … ] aMovingAverage <- ifelse(exists("ResultsSimulation")=="FALSE",1,movavg(ResultsSimulation$TotalSales, 12, type = "s”)) return (list(c([…])) }) } # Call Solver, and store results in a data frame ResultsSimulation <- data.frame(ode(y=stocks, times=simtime, func = model, parms=auxs, method="euler")) […]” My problem is, that the moving average (function: movavg) is only computed once and the same value is used in every timestep of the model. I.e. When running the model for the first time, 1 is used, running it for the next time the total sales value of the first timestep is used. Since only one timestep exists, this is logical. Yet I would expect the movavg function to produce a new value in each of the 120 timesteps, as it is the case with all other flow, stock and aux calculations as well. It would be great if you could help me with fixing this problem. Many thanks in advance! Yours, Jan [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and LINGO?
Hei Im struggling with this problem: b) Another company wants to compose the optimal project portfolio based on the following 5- year project proposals. In the table, the cash flow for each project in each year is shown. Project 1 Project 2 Project 3 Project 4 Project 5 Project 6 1st year of the project -58 -32 -18 -31 -33 -39 2nd year of the project 17 17 11 4 21 30 3rd year of the project 26 30 13 19 20 9 4th year of the project 18 7 4 7 22 13 5th year of the project 40 6 7 17 6 13 In this case, the company can also choose which year each project should commence. These six candidate projects can begin either in 2018, in 2019 or in 2020, or not at all. The current proposal is to undertake project 1, 2, 3 and 5, with project 3 and 5 starting in 2018, project 2 in 2019 and project 1 in 2020. Available funds by the end of year 2017 will be 70 mill. The resulting cash flow is given in the following table: Project 1 Project 2 Project 3 Project 5 Total cash flow from projects Available funds 2017 70 2018 -18 -33 -51 19 2019 -32 11 21 0 19 2020 -58 17 13 20 -8 11 2021 17 30 4 22 73 84 2022 26 7 7 6 46 130 2023 18 6 24 154 2024 40 40 194 Formulate an optimization model in LINGO to determine which projects to undertake, and in which years. The goal is to maximize available funds by the end of year 2024, while making sure that available funds are always non-negative throughout the planning horizon. How much can the improve compared to the current proposal? (For simplicity, assume zero discount rate.) Med Vennelig Hilsen Jan Olsen R�yland [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear regression with tranformed dependant variable
Dear all, I am trying to fit a multiple linear regression model with a transformed dependant variable (the normality assumption was not verified...). I have realised a sqrt(variable) transformation... The results are great, but I don't know how to interprete the beta coefficients... Is it possible to do another transformation to get interpretable beta coefficients to express the variations in the original untransformed dependant variable ? Thank you very much for your help!Noémie [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not
With lrm.fit you are fitting a completely different model. One of the things lrm does, is preparing the input for lrm.fit which in this case means that dummy variables are generated for categorical variables such as 'KILLIP'. The error message means that model did not converge after the maximum number of iterations. One possible solution is to try to increase the maximum number of iterations, e.g.: fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT, data = gusto2, maxit = 100) HTH, Jan On 14-09-17 09:30, Bonnett, Laura wrote: Dear all, I am using the publically available GustoW dataset. The exact version I am using is available here: https://drive.google.com/open?id=0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT. I have successfully fitted a logistic regression model using the "glm" function as shown below. library(rms) gusto <- spss.get("GustoW.sav") fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE) However, my review of the literature and other websites suggest I need to use "lrm" for the purposes of producing a nomogram. When I run the command using "lrm" (see below) I get an error message saying: Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) : Unable to fit model using "lrm.fit" My code is as follows: gusto2 <- gusto[,c(1,3,5,8,9,10)] gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes")) gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4")) gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes")) gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes")) var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location") label(gusto2)=lapply(names(var.labels),function(x) label(gusto2[,x])=var.labels[x]) ddist = datadist(gusto2) options(datadist='ddist') fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2) Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) : Unable to fit model using "lrm.fit" Online solutions to this problem involve checking whether any variables are redundant. However, the results for my data suggest that none are. redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2) Redundancy Analysis redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2) n: 2188 p: 5nk: 3 Number of NAs: 0 Transformation of target variables forced to be linear R-squared cutoff: 0.9 Type: ordinary R^2 with which each variable can be predicted from all other variables: AGEHYP KILLIPHRTANT 0.028 0.032 0.053 0.046 0.040 No redundant variables I've also tried just considering "lrm.fit" and that code seems to run without error too: lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30) Logistic Regression Model lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT, gusto2$ANT), y = gusto2$DAY30) Model Likelihood DiscriminationRank Discrim. Ratio Test Indexes Indexes Obs 2188LR chi2 233.59R2 0.273C 0.846 0 2053d.f. 5g1.642Dxy 0.691 1135Pr(> chi2) <0.0001gr 5.165gamma 0.696 max |deriv| 4e-09 gp 0.079tau-a 0.080 Brier0.048 Coef S.E. Wald Z Pr(>|Z|) Intercept -13.8515 0.9694 -14.29 <0.0001 x[1]0.0989 0.0103 9.58 <0.0001 x[2]0.9030 0.1510 5.98 <0.0001 x[3]1.3576 0.2570 5.28 <0.0001 x[4]0.6884 0.2034 3.38 0.0007 x[5]0.6327 0.2003 3.16 0.0016 I was therefore hoping someone would explain why the "lrm" code is producing an error message, while "lrm.fit" and "glm" do not. In particular I would welcome a solution to ensure I can produce a nomogram. Kind regards, Laura Dr Laura Bonnett NIHR Post-Doctoral Fellow Department of Biostatistics, Waterhouse Building, Block F, 1-5 Brownlow Street, University of Liverpool, Liverpool, L69 3GL 0151 795 9686 l.j.bonn...@liverpool.ac.uk [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to extract values after using metabin from the package meta?
Hello, I’m trying to do a meta-analysis with R. I tried to use the function metabin from the package meta : data <- data.frame(matrix(rnorm(40,25), nrow=17, ncol=8)) centres<-c("SVP","NANTES","STRASBOURG","GRENOBLE","ANGERS","TOULON","MARSEILLE","COLMAR","BORDEAUX","RENNES","VALENCE","CAEN","NANCY") rownames(data) = centres colnames(data) = c("case_exposed","witness_exposed","case_nonexposed","witness_nonexposed","exposed","nonexposed","case","witness") metabin( data$case_exposed, data$case, data$witness_exposed, data$witness, studlab=centres, data=data, sm="OR") where data_meta is a data frame with the number of case_exposed, case_data, witness_exposed, witness for each centre. I obtain after using metabin : How can I extract the values of OR and 95%-CI in the fixed effect model and the random effects model? I want to put these data in another array. I tried to use summary, but it doesn’t change anything. Thanks for your help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R 3.4.0 on Windows 7 Home Premium installed apparently fine, but packages failing to load ...
Hello! I welcome the new *R* 3.4.0. I installed it on my Windows 7 Home Premium [Service Pack 1, updated to latest, running on an HP AMD(Phenom) II 955 X4 Processor, 3.20 GHz, 16 GB RAM, 64-bit, with lots of free storage on disk and a solid state disk for virtual cache]. It was installed atop the previous *R* version. I tried the usual *update.packages(ask=FALSE)* and found many instances of packages, e.g., *ctmm*, *SweaveListingUtils*, *plotly*, *scatterplot3d*, *startupmsg *which failed to install, apparently because of an attempt to include an install of i386 instead of only x64. I was using the Berkeley mirror via *https:*> > R version 3.4.0 (2017-04-21) -- "You Stupid Darkness" > Copyright (C) 2017 The R Foundation for Statistical Computing > Platform: x86_64-w64-mingw32/x64 (64-bit) I opted *not* to install from source those packages requiring compilation, although Rtools was installed and when installing a package before, FORTRAN compilations succeed. I also have an MSVC++ installed, but I've not gotten that to work on Windows, unlike when I install on Ubuntu machines. Unfortunately, install at least for these packages fails: Do you want to install from sources the packages which need compilation?y/n: n Package which is only available in source form, and may need compilation of C/C++/Fortran: ‘gpclib’ Do you want to attempt to install these from sources? y/n: n trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/crosstalk_1.0.0.zip'Content type 'application/zip' length 598840 bytes (584 KB) downloaded 584 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/deldir_0.1-12.zip'Content type 'application/zip' length 173098 bytes (169 KB) downloaded 169 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/distr_2.6.zip'Content type 'application/zip' length 2226722 bytes (2.1 MB) downloaded 2.1 MB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/distrEx_2.6.zip'Content type 'application/zip' length 720392 bytes (703 KB) downloaded 703 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/foreign_0.8-67.zip'Content type 'application/zip' length 309745 bytes (302 KB) downloaded 302 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/gam_1.14-3.zip'Content type 'application/zip' length 319049 bytes (311 KB) downloaded 311 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/lattice_0.20-34.zip'Content type 'application/zip' length 731408 bytes (714 KB) downloaded 714 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/MASS_7.3-45.zip'Content type 'application/zip' length 1173817 bytes (1.1 MB) downloaded 1.1 MB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/rpart_4.1-10.zip'Content type 'application/zip' length 950721 bytes (928 KB) downloaded 928 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/sem_3.1-8.zip'Content type 'application/zip' length 1110127 bytes (1.1 MB) downloaded 1.1 MB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/SparseM_1.76.zip'Content type 'application/zip' length 952285 bytes (929 KB) downloaded 929 KB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/survival_2.41-2.zip'Content type 'application/zip' length 5426933 bytes (5.2 MB) downloaded 5.2 MB trying URL ' https://mirrors.nics.utk.edu/cran/bin/windows/contrib/3.4/VineCopula_2.1.1.zip'Content type 'application/zip' length 1106702 bytes (1.1 MB) downloaded 1.1 MB package ‘crosstalk’ successfully unpacked and MD5 sums checked package ‘deldir’ successfully unpacked and MD5 sums checked package ‘distr’ successfully unpacked and MD5 sums checked package ‘distrEx’ successfully unpacked and MD5 sums checked package ‘foreign’ successfully unpacked and MD5 sums checked package ‘gam’ successfully unpacked and MD5 sums checked package ‘lattice’ successfully unpacked and MD5 sums checked package ‘MASS’ successfully unpacked and MD5 sums checked package ‘rpart’ successfully unpacked and MD5 sums checked package ‘sem’ successfully unpacked and MD5 sums checked package ‘SparseM’ successfully unpacked and MD5 sums checked package ‘survival’ successfully unpacked and MD5 sums checked package ‘VineCopula’ successfully unpacked and MD5 sums checked The downloaded binary packages are in C:\Users\Jan\AppData\Local\Temp\Rtmpyekpgu\downloaded_packages installing the source packages ‘ctmm’, ‘plotly’, ‘scatterplot3d’, ‘startupmsg’, ‘SweaveListingUtils’ trying URL ' https://mirrors.nics.utk.edu/cran/src/contrib/ctmm_0.3.6.tar.gz'Content type 'application/x-gzip' length 731682 bytes (714 KB) downloaded 714 KB trying URL ' https://mirrors.nics.utk.edu/cran/src/contrib/plotly_4.6.0.tar.gz'Content type 'application/x-gzip' length 980458 bytes (957 KB) downloaded 957 KB trying URL ' https://mirrors.nics.utk.edu/cran/src/contri
Re: [ESS] Curly brace indentation
Hi Martin and All, thanks for your reply, please see my comments inline below: On Thu, Dec 08, 2016 at 09:09:10AM +0100, Martin Maechler wrote: > >>>>> Jan T Kim via ESS-help <ess-help@r-project.org> > >>>>> on Tue, 6 Dec 2016 01:11:20 + writes: > > > Hello All, > > since some time, I get the following indentation behaviour: If I type > > > f <- function(x) > > { > > return(x * x); > > } > > > > this gets indented as > > > > f <- function(x) > > { > > return(x * x); > > } > > > > i.e. the closing curly brace is not vertically aligned with the opening > one. > > What exactly is "if I type" ? > - in an emacs buffer for a foo.R file (i.e. a buffer in R-mode), > right ? yes -- I used a filename ending with ".R", the mode shows as "(ESS[S] [none] ElDoc)". With the "if I type" sample I basically mean to express that the indentation after the opening curly brace is automatically generated, rather than typed by me. > Well, I don't see this > (and I would never use unnecessary ' ; ' nor >unnecessary return(.) .. but that's not really relevant here) > > Specifically, after typing [Enter] at the end of the line >return(x * x); > of course the cursor on the next line is below the first letter > 'r' of 'return'; yes, that's what I get as well... > but then if you type "}" and [Enter] or [Tab] then it alings > correctly. > > Don't you see that? ... but no, that "electric" alignment of the closing curly brace is not what I get. The brace doesn't move to the left, it appears below the "r" of "return" and stays there. > If yes, how could ESS behave any better? The behaviour you describe is what I would like. As some additional detail about the system, this is a newly installed Ubuntu 16.04.1 LTS (Xenial Xerus), and the ESS package is ii ess16.10-1xenia all Emacs mode for statistical ... Best regards, Jan > > If I then go on and indent the buffer (C-x h C-M-\), the indentation is > > updated to > > > > f <- function(x) > > { > > return(x * x); > > } > > > so the opening and closing curly braces are now vertically aligned. > > > This behaviour started several months ago. Reviewing the change logs, > > I speculate that upgrading (via Ubuntu package manager) to a package > > providing 15.09, where "the indentation logic has been refactored", > > may be the cause of the change, but as I've done little R coding for > > a while I can't really pinpoint this. > > > I recently got a new computer at work and used that opportunity to > > check that the behaviour occurs with a new account, i.e. without any > > ~/.emacs file. > > > After some code delving and hacking I've managed to adjust the electric > > curly braces by adding this to my .emacs: > > > (defun jtk-ess-electric-brace (arg) > > "modified / extended ess-electric-brace" > > (interactive "P") > > (progn > > ; (message "modified ess-electric-brace running") > > (ess-electric-brace arg) > > (ess-indent-command) > > ) > > ) > > > > (defun jtk-ess-mode-hook () > > (progn > > (local-set-key (kbd "{") 'jtk-ess-electric-brace) > > (local-set-key (kbd "}") 'jtk-ess-electric-brace) > > ) > > ) > > > So essentially I have the brace indented immediately after inserting > > it via the original ess-electric-brace command. However, this solution > > is not 100% perfect as the indentation of closing braces occurs only > > after some delay caused by briefly flashing the cursor at the > corresponding > > opening brace. > > > Quite possibly I'm using a clumsy approach to try to get indentation > during > > typing consistent with that produced by indent-region, so suggestions > where > > I may have messed up are welcome. > > > Best regards & thanks in advance for any pointers, Jan > > -- > > +- Jan T. Kim ---+ > > | email: jtt...@gmail.com| > > | WWW: http://www.jtkim.dreamhosters.com/ | > > *-=< hierarchical systems are for files, not for humans >=-* > > > __ > > ESS-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/ess-help __ ESS-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/ess-help
[ESS] Curly brace indentation
Hello All, since some time, I get the following indentation behaviour: If I type f <- function(x) { return(x * x); } this gets indented as f <- function(x) { return(x * x); } i.e. the closing curly brace is not vertically aligned with the opening one. If I then go on and indent the buffer (C-x h C-M-\), the indentation is updated to f <- function(x) { return(x * x); } so the opening and closing curly braces are now vertically aligned. This behaviour started several months ago. Reviewing the change logs, I speculate that upgrading (via Ubuntu package manager) to a package providing 15.09, where "the indentation logic has been refactored", may be the cause of the change, but as I've done little R coding for a while I can't really pinpoint this. I recently got a new computer at work and used that opportunity to check that the behaviour occurs with a new account, i.e. without any ~/.emacs file. After some code delving and hacking I've managed to adjust the electric curly braces by adding this to my .emacs: (defun jtk-ess-electric-brace (arg) "modified / extended ess-electric-brace" (interactive "P") (progn ; (message "modified ess-electric-brace running") (ess-electric-brace arg) (ess-indent-command) ) ) (defun jtk-ess-mode-hook () (progn (local-set-key (kbd "{") 'jtk-ess-electric-brace) (local-set-key (kbd "}") 'jtk-ess-electric-brace) ) ) So essentially I have the brace indented immediately after inserting it via the original ess-electric-brace command. However, this solution is not 100% perfect as the indentation of closing braces occurs only after some delay caused by briefly flashing the cursor at the corresponding opening brace. Quite possibly I'm using a clumsy approach to try to get indentation during typing consistent with that produced by indent-region, so suggestions where I may have messed up are welcome. Best regards & thanks in advance for any pointers, Jan -- +- Jan T. Kim ---+ | email: jtt...@gmail.com| | WWW: http://www.jtkim.dreamhosters.com/ | *-=< hierarchical systems are for files, not for humans >=-* __ ESS-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/ess-help
Re: [R] function which returns number of occurrences of a pattern in string
Bob and Max, I thank you. It helped me much. 2016-10-21 3:47 GMT+02:00 Bob Rudis <b...@rud.is>: > `stringi::stri_count()` > > I know that the `stringr` pkg saves some typing (it wraps the > `stringi` pkg), but you should really just use the `stringi` package. > It has many more very useful functions with not too much more typing. > > On Thu, Oct 20, 2016 at 5:47 PM, Jan Kacaba <jan.kac...@gmail.com> wrote: > > Hello dear R-help > > > > I tried to find function which returns number of occurrences of a pattern > > in string. The closest match I've found is str_locate_all in stringr > > package. I can use str_locate_all but write my function but I don't want > > reinvent wheel. > > > > JK > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] function which returns number of occurrences of a pattern in string
Hello dear R-help I tried to find function which returns number of occurrences of a pattern in string. The closest match I've found is str_locate_all in stringr package. I can use str_locate_all but write my function but I don't want reinvent wheel. JK [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] strange output of cat function used in recursive function
2016-10-01 18:02 GMT+02:00 David Winsemius <dwinsem...@comcast.net>: > >> On Oct 1, 2016, at 8:44 AM, Jan Kacaba <jan.kac...@gmail.com> wrote: >> >> Hello Dear R-help >> >> I tried to understand how recursive programming works in R. Bellow is >> simple recursive function. >> >> binary1 <- function(n) { >> if(n > 1) { >>binary(as.integer(n/2)) >> } >> cat(n %% 2) >> } > > Did you mean to type "binary1(as.integer(n)"? Yes I meant that. >> When I call binary1(10) I get 1010. I believe that cat function stores >> value to a buffer appending values as recursion proceeds and at the >> end it prints the buffer. Am I right? > > No. Read the ?cat help page. It returns NULL. The material you see at the > console is a side-effect. >> >> I tried to modify the function to get some understanding: >> >> binary2 <- function(n) { >> if(n > 1) { >>binary2(as.integer(n/2)) >> } >> cat(n %% 2, sep=",") >> } >> >> With call binary2(10) I get also 1010. Why the output is not separated >> by commas? > > I think because there is nothing to separate when it prints (since there was > no "buffer". If I use function: binary3 <- function(n) { if(n > 1) { binary3(as.integer(n/2)) } cat(n %% 2, ",") } and call binary3(10) the console output is separated. So there must be some kind of buffer and also it looks like there is some inconsistency in how cat function behaves. Probably there is other explanation. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] strange output of cat function used in recursive function
Hello Dear R-help I tried to understand how recursive programming works in R. Bellow is simple recursive function. binary1 <- function(n) { if(n > 1) { binary(as.integer(n/2)) } cat(n %% 2) } When I call binary1(10) I get 1010. I believe that cat function stores value to a buffer appending values as recursion proceeds and at the end it prints the buffer. Am I right? I tried to modify the function to get some understanding: binary2 <- function(n) { if(n > 1) { binary2(as.integer(n/2)) } cat(n %% 2, sep=",") } With call binary2(10) I get also 1010. Why the output is not separated by commas? If I use in binary2 function cat(n %% 2, ",") on last line, the output is separated. Outside recursive function the cat function prints separated output in both cases e.g. cat(c(1:10), sep=",") and cat(c(1:10), ",") Derek __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R Studio: Run script upon saving or exiting
Dear R help, I would like to run script upon saving project files or exiting the R Studio. For example I would like to backup whole project in another directory. The backup directory should be named such that incremental version number will added to project name. Is it somehow possible? Even better would be if someone can also quickly go through file versions. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] print all variables inside function
Hello dear R-help I would like to use some short and simple names multiple times inside one script without collisions. I need to wrap the variables inside some object. I know I can use class function or environment. For example as follows: exmp1<-function(){ # knowns pa=0.35 pb=0.35 pc=0.30 pad=0.015 pbd=0.010 pcd=0.020 # unknowns pd=pa*pad+pb*pbd+pc*pcd pdc=pc*pcd/pd pda=pa*pad/pd pba=pb*pbd/pd y<-c(pad=pad,pbd=pbd,pcd=pcd,pd=pd,pdc=pdc,pda=pda,pba=pba) # this line I would like to automate so I don't have to write it every time return(y) } output<-exmp1() Is it somehow possible to print 'Unknows' and 'Knowns' from exmp1 function without the need of explicitly write the 'y' line which puts all variables inside list? For example with an imaginary function 'fprint' which takes exmp1 as the input: fprint(exmp1). __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] break string at specified possitions
Excellent Hervé, thank you. 2016-05-13 11:48 GMT+02:00 Hervé Pagès <hpa...@fredhutch.org>: > Hi, > > Here is the Biostrings solution in case you need to chop a long > string into hundreds or thousands of fragments (a situation where > base::substring() is very inefficient): > > library(Biostrings) > > ## Call as.character() on the result if you want it back as > ## a character vector. > fast_chop_string <- function(x, ends) > { > if (!is(x, "XString")) > x <- as(x, "XString") > extractAt(x, at=PartitioningByEnd(ends)) > } > > Will be much faster than substring (e.g. 100x or 1000x) when > chopping a string like a Human chromosome into hundreds or > thousands of fragments. > > Biostrings is a Bioconductor package: > > https://bioconductor.org/packages/Biostrings > > Cheers, > H. > > > > On 05/12/2016 01:18 AM, Jan Kacaba wrote: >> >> Nice solution Jim, thank you. >> >> >> >> 2016-05-12 2:45 GMT+02:00 Jim Lemon <drjimle...@gmail.com>: >>> >>> Hi again, >>> Sorry, that should be: >>> >>> chop_string<-function(x,ends) { >>> starts<-c(1,ends[-length(ends)]+1) >>> return(substring(x,starts,ends)) >>> } >>> >>> Jim >>> >>> On Thu, May 12, 2016 at 10:05 AM, Jim Lemon <drjimle...@gmail.com> wrote: >>>> >>>> Hi Jan, >>>> This might be helpful: >>>> >>>> chop_string<-function(x,ends) { >>>> starts<-c(1,ends[-length(ends)]-1) >>>> return(substring(x,starts,ends)) >>>> } >>>> >>>> Jim >>>> >>>> >>>> On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba <jan.kac...@gmail.com> >>>> wrote: >>>>> >>>>> Here is my attempt at function which computes margins from positions. >>>>> >>>>> require("stringr") >>>>> require("dplyr") >>>>> >>>>> ends<-seq(10,100,8) # end margins >>>>> test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing >>>>> elit. Aliquam in lorem sit amet leo accumsan lacinia." >>>>> >>>>> sekoj=function(ends){ >>>>>l_ends<-length(ends) >>>>>begs=vector(mode="integer",l_ends) >>>>>begs[1]=1 >>>>>for (i in 2:(l_ends)){ >>>>> begs[i]<-ends[i-1]+1 >>>>>} >>>>>margs<-rbind(begs,ends) >>>>>margs<-cbind(margs,c(ends[l_ends]+1,-1)) >>>>>#rownames(margs)<-c("beg","end") >>>>>return(margs) >>>>> } >>>>> margins<-sekoj(ends) >>>>> str_sub(test_string,margins[1,],margins[2,]) %>% print >>>>> >>>>> Code to run in browser: >>>>> http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV >>>>> >>>>> 2016-05-11 23:12 GMT+02:00 Bert Gunter <bgunter.4...@gmail.com>: >>>>>> >>>>>> Dunno -- but you might have a look at Hadley Wickham's 'stringr' >>>>>> package: >>>>>> https://cran.r-project.org/web/packages/stringr/stringr.pdf >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Bert >>>>>> >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "The trouble with having an open mind is that people keep coming along >>>>>> and sticking things into it." >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>>> >>>>>> >>>>>> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba <jan.kac...@gmail.com> >>>>>> wrote: >>>>>>> >>>>>>> Dear R-help >>>>>>> >>>>>>> I would like to split long string at specified precomputed positions. >>>>>>> 'substring' needs beginings and ends. Is there a native function >>>>>>> which >>>>>>> accepts positions so I don't have to count second argument? >>>>>>> >>>>>>> For example I have vector of possitions pos<-c(5,10,19). Substring >>>>>>> needs input first=c(1,6,11) and last=c(5,10,19). There is no prob
Re: [R] break string at specified possitions
Nice solution Jim, thank you. 2016-05-12 2:45 GMT+02:00 Jim Lemon <drjimle...@gmail.com>: > Hi again, > Sorry, that should be: > > chop_string<-function(x,ends) { > starts<-c(1,ends[-length(ends)]+1) > return(substring(x,starts,ends)) > } > > Jim > > On Thu, May 12, 2016 at 10:05 AM, Jim Lemon <drjimle...@gmail.com> wrote: >> Hi Jan, >> This might be helpful: >> >> chop_string<-function(x,ends) { >> starts<-c(1,ends[-length(ends)]-1) >> return(substring(x,starts,ends)) >> } >> >> Jim >> >> >> On Thu, May 12, 2016 at 7:23 AM, Jan Kacaba <jan.kac...@gmail.com> wrote: >>> Here is my attempt at function which computes margins from positions. >>> >>> require("stringr") >>> require("dplyr") >>> >>> ends<-seq(10,100,8) # end margins >>> test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing >>> elit. Aliquam in lorem sit amet leo accumsan lacinia." >>> >>> sekoj=function(ends){ >>> l_ends<-length(ends) >>> begs=vector(mode="integer",l_ends) >>> begs[1]=1 >>> for (i in 2:(l_ends)){ >>> begs[i]<-ends[i-1]+1 >>> } >>> margs<-rbind(begs,ends) >>> margs<-cbind(margs,c(ends[l_ends]+1,-1)) >>> #rownames(margs)<-c("beg","end") >>> return(margs) >>> } >>> margins<-sekoj(ends) >>> str_sub(test_string,margins[1,],margins[2,]) %>% print >>> >>> Code to run in browser: >>> http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV >>> >>> 2016-05-11 23:12 GMT+02:00 Bert Gunter <bgunter.4...@gmail.com>: >>>> Dunno -- but you might have a look at Hadley Wickham's 'stringr' package: >>>> https://cran.r-project.org/web/packages/stringr/stringr.pdf >>>> >>>> Cheers, >>>> >>>> Bert >>>> >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming along >>>> and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba <jan.kac...@gmail.com> wrote: >>>>> Dear R-help >>>>> >>>>> I would like to split long string at specified precomputed positions. >>>>> 'substring' needs beginings and ends. Is there a native function which >>>>> accepts positions so I don't have to count second argument? >>>>> >>>>> For example I have vector of possitions pos<-c(5,10,19). Substring >>>>> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem >>>>> to write my own function. Just asking. >>>>> >>>>> Derek >>>>> >>>>> __ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> __ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] break string at specified possitions
Here is my attempt at function which computes margins from positions. require("stringr") require("dplyr") ends<-seq(10,100,8) # end margins test_string<-"Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aliquam in lorem sit amet leo accumsan lacinia." sekoj=function(ends){ l_ends<-length(ends) begs=vector(mode="integer",l_ends) begs[1]=1 for (i in 2:(l_ends)){ begs[i]<-ends[i-1]+1 } margs<-rbind(begs,ends) margs<-cbind(margs,c(ends[l_ends]+1,-1)) #rownames(margs)<-c("beg","end") return(margs) } margins<-sekoj(ends) str_sub(test_string,margins[1,],margins[2,]) %>% print Code to run in browser: http://www.r-fiddle.org/#/fiddle?id=rVmNVxDV 2016-05-11 23:12 GMT+02:00 Bert Gunter <bgunter.4...@gmail.com>: > Dunno -- but you might have a look at Hadley Wickham's 'stringr' package: > https://cran.r-project.org/web/packages/stringr/stringr.pdf > > Cheers, > > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, May 11, 2016 at 1:12 PM, Jan Kacaba <jan.kac...@gmail.com> wrote: >> Dear R-help >> >> I would like to split long string at specified precomputed positions. >> 'substring' needs beginings and ends. Is there a native function which >> accepts positions so I don't have to count second argument? >> >> For example I have vector of possitions pos<-c(5,10,19). Substring >> needs input first=c(1,6,11) and last=c(5,10,19). There is no problem >> to write my own function. Just asking. >> >> Derek >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] break string at specified possitions
Dear R-help I would like to split long string at specified precomputed positions. 'substring' needs beginings and ends. Is there a native function which accepts positions so I don't have to count second argument? For example I have vector of possitions pos<-c(5,10,19). Substring needs input first=c(1,6,11) and last=c(5,10,19). There is no problem to write my own function. Just asking. Derek __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] svyciprop object
Hi, I'd like to access to the different elements in a svyciprop object (to the confidence intervals in particular...). But none of the functions I know works.Thank you for your help ! > grr <- svyciprop(~temp==bzz, dclus1)> grr 2.5% > 97.5%temp == bzz 0.040719697 0.027622756 0.05965> attributes(grr)$names[1] > "temp == bzz" $var as.numeric(temp == bzz)as.numeric(temp == bzz) 6.42377038236e-05 $ci 2.5% 97.5% 0.0276227559667 0.0596454643748 $class[1] "svyciprop" > grr$ciErreur dans grr$ci : $ operator is invalid for atomic vectors> > grr["ci"] NA > ci(grr)Erreur : impossible de trouver la fonction "ci" [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] row names, coulmn names
Hello dear R helpers, Is it possible to have more than 1 row for column names in data.frame, array, tbl_df? I would like to have column numbers in the first row, string names in the second row, physical unit in third row. How would I do it? Derek [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inserting row(column) in array or dataframe at specified row(column)
Hello dear R users, Is there a function or package which can insert row, column or array in another array at specified place (row or column)? I have made several attempts at this function optimizing both speed, code readability and ease of use. The functions are of following format: appcol=function(original_array, inserted_object, column_number, overwrite=FALSE) # If overwrite=TRUE the columns after column_number are ovewritten by inserted_object else the columns after column_number are shifted. Now I have started using package dplyr and it seams that there is no inserting function either. One can only append at the end or at the beginning of tbl_df. Is it true? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] (windows) opening document with particular exe file
Hello dear R, I dont have specific task on mind just learning R. 1) Is it possible to open a document for example path1\myfile.pdf with program path2\pdfviewer.exe ? How would I do it in win? Does it differ in linux? 2) Is it possible to run a program and supply to it some streams? The streams are for example txt file or web address. One specific task which comes to mid: I would like to draw in inkscape programmatically with script. Is it somehow possible? Thank you very much for any help in advance. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Accented characters, windows
Duncun, thank you for your reply. My encoding is: > Sys.getlocale('LC_CTYPE') [1] "Czech_Czech Republic.1250" In RStudio I use UTF-8. I tried also other recommended encodings but some characters are still misrepresented. I've found solution to this. To correctly display strings in RStudio I have to convert strings: iconv(x,"CP1250","UTF-8") If I want to write string into file: zz=file("myfile.txt", "w", encoding="UTF-8") cat(x,file = zz, sep = "\n") It seems there is no need using icon() if I just need to write string to a file. I hope there is no problem processing strings with other functions like paste, strsplit, grep though. Derek 2016-03-30 0:56 GMT+02:00 Duncan Murdoch <murdoch.dun...@gmail.com>: > On 29/03/2016 5:39 PM, Jan Kacaba wrote: > >> I have problem with accented characters. My OS is Win 8.1 and I'm using >> RStudio. >> >> I make string : >> av="ěščřž" >> >> When I call "av" I get result bellow. >> >>> av >>> >> [1] "ìšèøž" >> >> The resulting characters are different. I have similar problem when I >> write >> string to a file. In RGUI if I call "av" it prints characters correctly, >> but using "write" function to print string in a file results in the same >> problem. >> >> Can you please help me how to deal with it? >> > > You don't say what code page you're using. > > R in Windows has a long standing problem that it works mainly in the local > code page, rather than working in UTF-8 as most other systems do. (This is > due to the fact that when the internationalization was put in, UTF-8 was > exotic, rather than ubiquitous as it is now.) So R can store UTF-8 strings > on any system, but for display it converts them to the local code page, and > that conversion can lose information if the characters aren't supported > locally. > > With your string, I don't see the same thing as you, I see > > "ešcrž" > > which is also incorrect, but looks a little closer, because it does a > better approximation in my code page. > > So if you think my result is better than yours, you could change your > system to code page 437 as I'm using, but that will probably cause you > worse problems. > > Probably the only short term solution that would be satisfactory is to > stop using Windows. At some point in the future the internal character > handling in R needs an overhaul, but that's a really big, really thankless > job. Perhaps Microsoft/Revolution will donate some programmer time to do > it, but more likely, it will wait for volunteers in R Core to do it. I > don't think it will happen in 2016. > > Duncan Murdoch > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Accented characters, windows
I have problem with accented characters. My OS is Win 8.1 and I'm using RStudio. I make string : av="ěščřž" When I call "av" I get result bellow. > av [1] "ìšèøž" The resulting characters are different. I have similar problem when I write string to a file. In RGUI if I call "av" it prints characters correctly, but using "write" function to print string in a file results in the same problem. Can you please help me how to deal with it? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R studio kniter
Hello, is it possible to run kiniter by script instead by clicking on button compile PDF? Say I have "texfile.rnw" and "myscript.R". I would like to knit texfile.rnw by runnig script "myscript.R". In "myscript.R" I would write something like this: knit("texfile.rnw") [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to reach the column names in a huge .RData file without loading it
On Wed, Mar 16, 2016 at 03:18:27PM -0400, Duncan Murdoch wrote: > On 16/03/2016 1:40 PM, Jan Kim wrote: > >Barry: that's an interesting hack. > > > >I do feel compelled to make two comments, though, regarding the > >general issue rather than the scraping idea: > > > >(1) If your situation is that that image (.RData file) is the only > >copy of the data, you'll need to rescue the data from that as soon as > >possible anyway. Something like > > > > load(".RData"); > > write.csv(mydataframe, file = "mydata.csv"); > > > >should do this trick. It will be slow, but you'll need to do it just > >once, so you might as well enjoy your coffee while you wait. From that > >point on, work with the mydata.csv file for getting at the colnames > >(and anything else as well). > > > >(2) If there's any chance / risk that scraping data off images is not > >a one-off, the time to prevent that from catching on is now. If data is > >of any value at all, it should be handled in a sane, portable, textual > >format. For tabular data, csv is normally adequate or at least good > >enough, but .RData images are never a good idea. > > I agree with the sentiment, but not with the choice of .csv as a > "sane, portable, textual format". CSV has no type information > included, so strings that contain only digits can turn into numbers > (and get rounded in the process), things that look like > dates can get converted to different formats, etc. I entirely agree. In hindsight, I should have stated that the .RData files, as well as the R code to load and extract stuff from them, should be stored permanently and documented. > The .RData format has the disadvantages of being hard to use outside > R, but at least it is usable in R. yes -- that's why I thought it's a good idea to use R to pluck out the valuable data, so (1) they can still be accessed even if the .RData format changes and (2) they're in their own file, separated from the (potentially homungous, see my P.S.) amount of other stuff caught up in the image. But to reiterate, the .RData file should be secured as well if that's the only remaining primary / original source of the data. > I don't know what I'd recommend if I wanted a portable textual > format. JSON is close, but it can't handle the full > range of data that R can handle (e.g. no Inf). dput() on a > dataframe is text, but nothing but R can read it. yes, that's the problem with "JSON", it's a JavaScript but not really an object notation, as it doesn't store class structure metadata. So again, the best bet is to secure multiple levels, the .RDdata image to preserve the R types, the R script to be able to identify the relevant variable(s), and the text version to avoid depending on availablility of R / an R version still able to read the image format. Best regards, Jan > Duncan Murdoch > > > > > >Best regards, Jan > > > >P.S.: I've seen .RData images containing many months worth of interactive > >work, and multiple variants of data frames in variables with more or less > >similar names, so the set of strings scraped off these will be rather more > >bewildering than in Barry's clean example. > > > > > >On Wed, Mar 16, 2016 at 05:17:25PM +, Barry Rowlingson wrote: > >> You *might* be able to get them from the raw file... > >> > >> First, I don't quite know what "colnames" of an .RData file means. > >> "colnames" are the column names of a matrix (or data frame), so I'll > >> assume your .RData file contains exactly one data frame and you want > >> to column names of it. > >> > >> So let's create one of those: > >> > >> > >> mydataframe = data.frame(mylongnamehere=runif(3), > >> anotherlongname=runif(3), z=runif(3), y=runif(3), > >> aasdkjhasdkjhaskdj=runif(3)) > >> save(mydataframe, file="./test.RData") > >> > >> Now I'm going to use some Unix utilities to see if there's any > >> identifiable strings in the file. .RData files are by default > >> compressed using `gzip`, so I'll `gunzip` them and pipe it into > >> `strings`: > >> > >> $ gunzip -c test.RData | strings -t d > >> 0 RDX2 > >> 35 mydataframe > >> 230 names > >> 251 mylongnamehere > >> 273 anotherlongname > >> 314 aasdkjhasdkjhaskdj > >> 347 row.names > >> 389 class > >> 410 data.frame > >> > >> > >> - thats found the object name (mydataframe) and most of the column > >> names except the short
Re: [R] How to reach the column names in a huge .RData file without loading it
Barry: that's an interesting hack. I do feel compelled to make two comments, though, regarding the general issue rather than the scraping idea: (1) If your situation is that that image (.RData file) is the only copy of the data, you'll need to rescue the data from that as soon as possible anyway. Something like load(".RData"); write.csv(mydataframe, file = "mydata.csv"); should do this trick. It will be slow, but you'll need to do it just once, so you might as well enjoy your coffee while you wait. From that point on, work with the mydata.csv file for getting at the colnames (and anything else as well). (2) If there's any chance / risk that scraping data off images is not a one-off, the time to prevent that from catching on is now. If data is of any value at all, it should be handled in a sane, portable, textual format. For tabular data, csv is normally adequate or at least good enough, but .RData images are never a good idea. Best regards, Jan P.S.: I've seen .RData images containing many months worth of interactive work, and multiple variants of data frames in variables with more or less similar names, so the set of strings scraped off these will be rather more bewildering than in Barry's clean example. On Wed, Mar 16, 2016 at 05:17:25PM +, Barry Rowlingson wrote: > You *might* be able to get them from the raw file... > > First, I don't quite know what "colnames" of an .RData file means. > "colnames" are the column names of a matrix (or data frame), so I'll > assume your .RData file contains exactly one data frame and you want > to column names of it. > > So let's create one of those: > > > mydataframe = data.frame(mylongnamehere=runif(3), > anotherlongname=runif(3), z=runif(3), y=runif(3), > aasdkjhasdkjhaskdj=runif(3)) > save(mydataframe, file="./test.RData") > > Now I'm going to use some Unix utilities to see if there's any > identifiable strings in the file. .RData files are by default > compressed using `gzip`, so I'll `gunzip` them and pipe it into > `strings`: > > $ gunzip -c test.RData | strings -t d > 0 RDX2 > 35 mydataframe > 230 names > 251 mylongnamehere > 273 anotherlongname > 314 aasdkjhasdkjhaskdj > 347 row.names > 389 class > 410 data.frame > > > - thats found the object name (mydataframe) and most of the column > names except the short ones, which are too short for `strings` to > recognise. But if your names are long enough (4 or more chars, I > think) they'll show up. > > Of course you'll have to filter them out from all the other string > output, but they should all appear shortly after the word "names", > since the colnames of a data frame are the "names" attribute of the > data. > > If you don't have a Unix or Mac machine handy you can get these > utilities on Windows via Cygwin but that's another story... > > Barry > > > > > > > > > On Wed, Mar 16, 2016 at 3:59 PM, Lida Zeighami <lid.z...@gmail.com> wrote: > > Hi, > > I have a huge .RData file and I need just to get the colnames of it. so is > > there any way to reach the column names without loading or reading the > > whole file? > > Since the file is so big and I need to repeat this process several times, > > so it takes so long to load the file first and then take the colnames! > > > > Thanks > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- +- Jan T. Kim ---+ | email: jtt...@gmail.com| | WWW: http://www.jtkim.dreamhosters.com/ | *-=< hierarchical systems are for files, not for humans >=-* __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] treating integer(0) and NULL in conditions and loops
Hello, I have following problem in loops. It occurred to me multiple times bellow is an example. Inside if() I have sometimes function f(x) which may return integer(0). If I test f(x)>1 and f(x)=integer(0) I get error. Maybe it can be solved more eloquently without loop or swithces. I don't know. Example: a=c("ab","abc","abcd","abcde","abcdefghjk") # vector from which new strings will be constructed svec=NULL # vector of string rz=NULL # string for (i in 1:10){ if (nchar(rz)>6){ svec[i]=rz rz=NULL } if (nchar(a[i])+nchar(rz))<6){ rz=paste(rz,a[i]) } if (nchar(rz)+nchar(a[i+1]>6){ svec[i]=rz rz=NULL } } I'm not interested how to treat nchar() function in particular but general function. One solution which comes to mind is to redefine function for example nchar() function like this: new.nchar=function(x){ if (length(nchar(rz))==0){z=0} if (length(nchar(rz))>0){z=nchar(rz)} return(z) } Is it correct way of doing this or is there a better way without the need of redefining new function? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] assign a vector to list sequence
Hello I would like to assign a vector to list sequence. I'm trying my code bellow, but the output is not what inteded. # my code mls=vector(mode="list") # my list cseq=c(1:3) # my vector mls[cseq]=cseq I get following: [[1]] [1] 1 [[1]] [2] 2 [[1]] [2] 3 What I need is this: [[1]] [1] 1 2 3 [[1]] [2] 1 2 3 [[1]] [2] 1 2 3 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.