Re: [Rd] Choices to remove `srcref` (and its buddies) when serializing objects
> -- > > Date: Wed, 17 Jan 2024 11:35:02 -0500 > > From: Dipterix Wang > > To: Lionel Henry , Tomas Kalibera > > > > Cc: r-devel@r-project.org > > Subject: Re: [Rd] Choices to remove `srcref` (and its buddies) when > > serializing objects > > Message-ID: <3cf4ca2d-9f72-4c7b-90aa-4d2e9f745...@gmail.com> > > Content-Type: text/plain; charset="utf-8" > > > > > > > > > > > On Wed, Jan 17, 2024 at 10:32 AM Tomas Kalibera > > > > wrote: > > > > > > > > I think one could implement hashing on the fly without any > > > > > > serialization, similarly to how identical works, but I am not aware of > > > > > > any existing implementation. Again, if that wasn't clear: I don't think > > > > > > trying to compute a hash of an object from its serialized representation > > > > > > is a good idea - it is of course convenient, but has problems like the > > > > > > one you have ran into. > > > > > > > > > > > > In some applications it may still be good enough: if by various tweaks, > > > > > > such as ensuring source references are off in your case, you achieve a > > > > > > state when false alarms are rare (identical objects have different > > > > > > hashes), and hence say unnecessary re-computation is rare, maybe it is > > > > > > good enough. > > > > > > > I really appreciate you answer my questions and solve my puzzles. I went back > and read the R internal code for `serialize` and totally agree on this, that > serialization is not a good idea for digesting R objects, especially on > environments, expressions, and functions. > > What I want is a function that can produce the same and stable hash for > identical objects. However, there is no function (given our best knowledge) > on the market that can do this. `digest::digest` and `rlang::hash` are the > first functions that come into my mind. Both are widely used, but they use > serialize. The author of `digest` said: > > > "As you know, digest takes and (ahem) "digests" what serialize gives it, > so you would have to look into what serialize lets you do." > > vctrs:::obj_hash is probably the closest to the implementation of > `identical`, but the above examples give different results for identical > objects. > > The existence of digest:: digest and rlang::hash shows that there is a huge > demand for this "ideal" hash function. However, I bet most people are using > digest/hash "incorrectly". Please read the full discussion to this old bug report: https://bugs.r-project.org/show_bug.cgi?id=18178 Quoting briefly: Serialization is not intended to be used this way. What serialization tries to provide is that x and unserialize(serialize(x, NULL)) will be identical() while preserving internal representation where possible. Two objects that are considered identical() can have very different internal representations, and their serializations will reflect this. You will see that it is not as simple as just removing the srcref or the bytecode to functions. The issue with the `identical()` function in that context was eventually patched, but the comment by R-Core that serialization is not intended to be used to produce a reliable hash stands. Use of `identical()` or `serialize()` is simply not designed to ensure the same hashable object (in terms of bytes). This is echoed by Tomas' comment above. But we note that it is 'good enough' in most cases. Fwiw `nanonext::sha256()` and family directly hashes character strings and raw objects, but uses the same approach as `digest::digest()` elsewhere. So if someone comes up with a canonical binary representation of R objects, it will be able to hash it reliably. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Bug report: parLapply with capture.output(type="message") produces an error
Hi Travers, This is an implementation detail for background workers in general, in that there must be some robust way for them to exit (either upon a signal from the main session, or if the main session ends / socket disconnects). As these are background workers, their error messages are usually not seen, and hence it has been deemed good enough that they exit in this case through error. However, you do see them in your case as you have diverted the message stream as Henrik has highlighted. This may be inconvenient, but can safely be ignored. If however, clean output is important in your use case, there is a new solution that has only just become available. This is a direct outcome of the R Project Sprint in Warwick from a month ago – Luke Tierney has actually opened up the `parallel` package to allow other packages to provide alternative communications backends. Only possible with R-devel, but as of yesterday a new version of the `mirai` package was released to CRAN that provides one such backend. You would simply replace your `makeCluster()` call with `mirai::make_cluster()`. That’s the only change. As this is the R-devel mailing list, I will not go into the details of this particular implementation, but it seems useful for users of `parallel` to know that this is now possible. As author of `mirai`, please reach out directly with questions on the package rather than replying on the list. I just want to highlight one other possibility - if you remove `capture.output()` in your evaluation and call `mirai::make_cluster(2, output = TRUE)` instead, you will then be able to see all the messages from the background workers in your main process. It’s probably not what you’re after, but just in case. Thanks, Charlie 6 October 2023 at 12:04, Travers Ching wrote: > > Hi Henrik, > > Thanks for the detailed technical explanation! I ended up using the > withCallingHandlers solution to achieve what I needed (thanks to stack > overflow). If this is not technically a bug I think it is unintuitive and > unexpected behavior from a user perspective. So take this as a feature > request rather than a bug report. > > The error message at the end of the script doesn't inform the user what > part of the script is wrong (using sink or capture.output in parallel). It > is difficult to understand what's going on. > > The "correct" solution using withCallingHandlers is esoteric, and I think > most users would not code that up naturally much less understand what it is > doing. Could capture.output(type="messages") be rewritten using this > approach? > > Lastly, the help file for stopCluster says > > "the workers will terminate themselves once the socket on which they are > listening for commands becomes unavailable, which it should if the master R > session is completed" > > To me, this implies that I shouldn't need to call stopCluster and that the > workers are automatically stopped at the end. The place where I first saw > the error was using future_lapply and following the vignette there's no > call to stopCluster there either. > > Best, > Travers > > On Thu, Oct 5, 2023 at 6:15 PM Henrik Bengtsson > wrote: > > > > > This is actually not a bug. If we really want to identify a bug, then > > it's actually a bug in your code. We'll get to that at the very end. > > Either way, it's an interesting report that reveals a lot of things. > > > > First, here's a slightly simpler version of your example: > > > > $ Rscript --vanilla -e 'library(parallel); cl <- makeCluster(1); x <- > > clusterEvalQ(cl, { capture.output(NULL, type = "message") })' > > Error in unserialize(node$con) : error reading from connection > > Calls: ... doTryCatch -> recvData -> recvData.SOCKnode -> > > unserialize > > Execution halted > > > > There are lots of things going on here, but before we get to the > > answer, the most important take-home message here is: > > > > Never ever use capture.output(..., type = "message") in R. > > > > Second comment is: > > > > No, really, do not do that! > > > > Now, towards what is going on in your example. First, I don't think > > help("capture.output") is too "kind" here, when it says: > > > > 'Messages sent to stderr() (including those from message, warning and > > stop) are captured by type = "message". Note that this can be “unsafe” > > and should only be used with care.' > > > > To understand why you shouldn't do this, you have to know that > > capture.output() uses sink() internally, and its help page says: > > > > "Sink-ing the messages stream should be done only with great care. For > > that stream file must be an already open connection, and there is no > > stack of connections." > > > > The "[When] Sink-ing the messages stream ... there is no stack of > > connections" is the reason for your the problem you're experiencing. > > What happens is that, the background workers that you launch with > > parallel::makeCluster() will use sink(...,
Re: [Rd] Question on non-blocking socket
> Date: Wed, 15 Feb 2023 01:24:26 +0100 > From: Ben Engbers > To: r-devel@r-project.org > Subject: [Rd] Question on non-blocking socket > Message-ID: <68ce63b0-7e91-6372-6926-59f3fcfff...@be-logical.nl> > Content-Type: text/plain; charset="utf-8"; Format="flowed" > > Hi, > > December 27, 2021 I started a thread asking for help troubleshooting > non-blocking sockets. > While developing the RBaseX client, I had issues with the authentication > process. It eventually turned out that a short break had to be inserted > in this process between sending the credentials to the server and > requesting the status. Tomas Kalibera put me on the right track by > drawing my attention to the 'socketSelect' function. I don't know > exactly the purpose of this function is (the function itself is > documented, but I can't find any information for which situations this > function should be called.) but it sufficed to call this function once > between sending and requesting. > > I have two questions. > The first is where I can find R documentation on proper use of > non-blocking sockets and on the proper use of the socketSelect function? > > The second question is more focused on using non-blocking sockets in > general. Is it allowed to execute a read and a receive command > immediately after each other or must a short waiting loop be built in. > I'm asking this because I'm running into the same problems in a C++ > project as I did with RBaseX. > > Ben Engbers > Hi Ben, For an easier experience with sockets, you may wish to have a look at the `nanonext` package. This wraps 'NNG' and is generally used for messaging over its own protocols (req/rep, pub/sub etc.), although you can also use it for HTTP and websockets. In any case, a low level stream interface allows connecting with arbitrary sockets. Using something like `s <- stream(dial = "tcp://0.0.0.0:")` substituting in the actual address. This would allow you greater flexibility in sending and receiving over the bytestream without worrying so much about order and timing as per your current experience. For example, a common pattern this allows for is doing an async receive `r <- recv_aio(s)` before sending a request `send(s, "some request")`, and then query the receive result afterwards at `r$data`. I won't go into too much detail here, but as it is my own package, please feel free to reach out separately via email or github etc. Thanks, Charlie __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R_GetCurrentEnv() not working as intended
Hi Lionel, I had indeed seen your Bugzilla, but must have misread the R source as I thought it had already been adopted. Thanks for sharing the workaround as well, it is interesting. As I can pass in `environment()` to my `.Call()`, I suspect there is not much difference given the call to `Rf_eval()` at the end of the workaround. Let's hope your patch gets reviewed and adopted. Thanks, Charlie November 14, 2022 8:55 AM, "Lionel Henry" wrote: > Hello, > > This function currently does not work when called from `.Call()`. > This is reported with a patch at > https://bugs.r-project.org/show_bug.cgi?id=17839 > > In the meantime, you can use this stopgap implementation: > > https://github.com/tidyverse/purrr/blob/55c9a8ab8788d878ce9e8e80b867139e46d15395/src/conditions.c#L6 > L34 > > Best, > Lionel > > On 11/13/22, Charlie Gao via R-devel wrote: > >> Perhaps my original question was too complicated, so I will just ask: is >> anyone using R_GetCurrentEnv() in their C code? If so, grateful if you could >> point me to an example where it is working for you. >> >> I have searched Github and only come across a couple of trivial uses as an >> argument to Rf_eval(), where it probably returns the global environment, >> with the result being indistinguishable in normal use. >> >> Thanks, >> >> Charlie >> >> October 22, 2022 12:52 AM, "Charlie Gao" >> wrote: >> >>> Dear all, >>> >>> I am attempting to use `R_GetCurrentEnv()` to return the current >>> environment within C code, but it >>> seems to always return the global environment. >>> >>> Specifically, I would like to use it as an argument to R_NewEnv() so it is >>> created with the correct >>> enclosing environment. I also have functions in the environment that >>> reference symbols in the >>> closure and I would also like to use `R_GetCurrentEnv()` as an argument to >>> `SET_CLOENV()`. >>> >>> My workaround at the moment is to pass `environment()` as one of the >>> arguments to the `.Call()`. >>> For the actual code I am referring to: >>> >>> https://github.com/shikokuchuo/nanonext/blob/main/src/aio.c#L516-L535 >>> >>> where I am currently passing `environment()` as 'clo' whereas ideally I >>> would be able to use >>> `R_GetCurrentEnv()` instead. >>> >>> There is an open Bugzilla report from 2020 that says `R_GetCurrentEnv()` >>> only returns the base >>> namespace from within a `.Call()`, however I see that the proposed patch >>> has already been adopted >>> in the R source. >>> >>> It seems that the function was introduced (fairly) recently in R 3.6, >>> presumably for such uses. I >>> would like to know if this is not the case or else confirmation that this >>> is an outstanding bug. >>> >>> Thanks, >>> >>> Charlie >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R_GetCurrentEnv() not working as intended
Perhaps my original question was too complicated, so I will just ask: is anyone using R_GetCurrentEnv() in their C code? If so, grateful if you could point me to an example where it is working for you. I have searched Github and only come across a couple of trivial uses as an argument to Rf_eval(), where it probably returns the global environment, with the result being indistinguishable in normal use. Thanks, Charlie October 22, 2022 12:52 AM, "Charlie Gao" wrote: > Dear all, > > I am attempting to use `R_GetCurrentEnv()` to return the current environment > within C code, but it > seems to always return the global environment. > > Specifically, I would like to use it as an argument to R_NewEnv() so it is > created with the correct > enclosing environment. I also have functions in the environment that > reference symbols in the > closure and I would also like to use `R_GetCurrentEnv()` as an argument to > `SET_CLOENV()`. > > My workaround at the moment is to pass `environment()` as one of the > arguments to the `.Call()`. > For the actual code I am referring to: > > https://github.com/shikokuchuo/nanonext/blob/main/src/aio.c#L516-L535 > > where I am currently passing `environment()` as 'clo' whereas ideally I would > be able to use > `R_GetCurrentEnv()` instead. > > There is an open Bugzilla report from 2020 that says `R_GetCurrentEnv()` only > returns the base > namespace from within a `.Call()`, however I see that the proposed patch has > already been adopted > in the R source. > > It seems that the function was introduced (fairly) recently in R 3.6, > presumably for such uses. I > would like to know if this is not the case or else confirmation that this is > an outstanding bug. > > Thanks, > > Charlie __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R_GetCurrentEnv() not working as intended
Dear all, I am attempting to use `R_GetCurrentEnv()` to return the current environment within C code, but it seems to always return the global environment. Specifically, I would like to use it as an argument to R_NewEnv() so it is created with the correct enclosing environment. I also have functions in the environment that reference symbols in the closure and I would also like to use `R_GetCurrentEnv()` as an argument to `SET_CLOENV()`. My workaround at the moment is to pass `environment()` as one of the arguments to the `.Call()`. For the actual code I am referring to: https://github.com/shikokuchuo/nanonext/blob/main/src/aio.c#L516-L535 where I am currently passing `environment()` as 'clo' whereas ideally I would be able to use `R_GetCurrentEnv()` instead. There is an open Bugzilla report from 2020 that says `R_GetCurrentEnv()` only returns the base namespace from within a `.Call()`, however I see that the proposed patch has already been adopted in the R source. It seems that the function was introduced (fairly) recently in R 3.6, presumably for such uses. I would like to know if this is not the case or else confirmation that this is an outstanding bug. Thanks, Charlie __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel