I think a machine-specific input, like the MAC address, to the UUID is essential. S+ used to make a seed for the random number generator based on the the current time and process ID. A customer complained that all machines in his cluster generated the same random number stream. The machines were rebooted each night, simultaneously, and S+ was started during the boot process so times and process ids were identical, hence the seeds were identical.
Bill Dunlap TIBCO Software wdunlap tibco.com On Mon, May 20, 2019 at 4:48 PM Henrik Bengtsson <henrik.bengts...@gmail.com> wrote: > # Proposal > > Provide a built-in mechanism for obtaining an identifier for the > current R session, e.g. > > > Sys.info()[["session_uuid"]] > [1] "4258db4d-d4fb-46b3-a214-8c762b99a443" > > The identifier should be "unique" in the sense that the probability > for two R sessions(*) having the same identifier should be extremely > small. There's no need for reproducibility, i.e. the algorithm for > producing the identifier may be changed at any time. > > (*) Two R sessions running at different times (seconds, minutes, days, > years, ...) or on different machines (locally or anywhere in the > world). > > > # Use cases > > In parallel-processing workflows, R objects may be "exported" > (serialized) to background R processes ("workers") for further > processing. In other workflows, objects may be saved to file to be > reloaded in a future R session. However, certain types of objects in > R maybe only be relevant, or valid, in the R session that created > them. Attempts to use them in other R processes may give an obscure > error or in the worst case produce garbage results. > > Having an identifier that is unique to each R process will make it > possible to detect when an object is used in the wrong context. This > can be done by attaching the session identifier to the object. For > example, > > obj <- 42L > attr(obj, "owner") <- Sys.info()[["session_uuid"]] > > With this, it is easy to validate the "ownership" later; > > stopifnot(identical(attr(obj, "owner"), Sys.info()[["session_uuid"]])) > > I argue that such an identifier should be part of base R for easy > access and avoid each developer having to roll their own. > > > # Possible implementation > > One proposal would be to bring in Simon Urbanek's 'uuid' package > (https://cran.r-project.org/package=uuid) into base R. This package > provides: > > > uuid::UUIDgenerate() > [1] "b7de6182-c9c1-47a8-b5cd-e5c8307a8efb" > > based on Theodore Ts'o's libuuid > (https://mirrors.edge.kernel.org/pub/linux/utils/util-linux/). From > 'man uuid_generate': > > "The uuid_generate function creates a new universally unique > identifier (UUID). The uuid will be generated based on high-quality > randomness from /dev/urandom, if available. If it is not available, > then uuid_generate will use an alternative algorithm which uses the > current time, the local ethernet MAC address (if available), and > random data generated using a pseudo-random generator. > [...] > The UUID is 16 bytes (128 bits) long, which gives approximately > 3.4x10^38 unique values (there are approximately 10^80 elementary > particles in the universe according to Carl Sagan's Cosmos). The new > UUID can reasonably be considered unique among all UUIDs created on > the local system, and among UUIDs created on other systems in the past > and in the future." > > An alternative, that does not require adding a dependency on the > libuuid library, would be to roll a poor man's version based on a set > of semi-unique attributes, e.g. > > make_id <- function(...) { > args <- list(...) > saveRDS(args, file = f <- tempfile()) > on.exit(file.remove(f)) > unname(tools::md5sum(f)) > } > > session_id <- local({ > id <- NULL > function() { > if (is.null(id)) { > id <<- make_id( > info = Sys.info(), > pid = Sys.getpid(), > tempdir = tempdir(), > time = Sys.time(), > random = sample.int(.Machine$integer.max, size = 1L) > ) > } > id > } > }) > > Example: > > > session_id() > [1] "8d00b17384e69e7c9ecee47e0426b2a5" > > > session_id() > [1] "8d00b17384e69e7c9ecee47e0426b2a5" > > /Henrik > > PS. Having a built-in make_id() function would be handy too, e.g. when > creating object-specific identifiers for other purposes. > > PPS. It would be neat if there was an object, or connection, interface > for tools::md5sum(), which currently only operates on files sitting on > the file system. The digest package provides this functionality. > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel