Re: [R] round and trailing zero

2024-07-30 Thread Göran Broström




Den 2024-07-30 kl. 17:09, skrev Jorgen Harmse:

Duncan Murdoch answered your question, but I have another. Are you
going to do some computation with the rounded numbers, 


Wouldn't dream of it.

or are they just for display? 


Yes.

G,


(One thing I like about Excel is that I can change
the display format of a cell without changing answers that depend on
that cell.) In the latter case, why stash them in a variable? For
more control of the display, consider sprintf (or a wrapper that
combines sprintf with cat).

Regards,

Jorgen Harmse.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round and trailing zero

2024-07-30 Thread Jorgen Harmse via R-help
Duncan Murdoch answered your question, but I have another. Are you going to do 
some computation with the rounded numbers, or are they just for display? (One 
thing I like about Excel is that I can change the display format of a cell 
without changing answers that depend on that cell.) In the latter case, why 
stash them in a variable? For more control of the display, consider sprintf (or 
a wrapper that combines sprintf with cat).

Regards,
Jorgen Harmse.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round and trailing zero

2024-07-29 Thread Göran Broström
Ah, thanks,

Göran

> 29 juli 2024 kl. 16:23 skrev Duncan Murdoch :
> 
> On 2024-07-29 10:06 a.m., Göran Broström wrote:
>> I have a "result":
>>  > hazards
>>  (60, 70](70, 80](80, 90]   (90, 100]
>> [1,] 0.046612937 0.115643783 0.273613266 0.450127975
>> Two issues: (i) Too many decimals, and (ii) it seems to be an 1x4
>> matrix, I only need the first row. (i):
>>  > haz <- round(hazards, 3)
>>  > haz
>>   (60, 70] (70, 80] (80, 90] (90, 100]
>> [1,]0.0470.1160.274  0.45
>> As expected, the fourth element lost a trailing zero. I'll deal with
>> that, but first (ii):
>>  > haz[1, ]
>>   (60, 70]  (70, 80]  (80, 90] (90, 100]
>>  0.047 0.116 0.274 0.450
>> And the trailing zero is mysteriously recovered!
>> Is there some general rule governing this behaviour?
> 
> R uses the same format for every element in each column when printing a 
> matrix or dataframe, and for every element in a vector.
> 
> Your first example had only one element per column.  If you had printed 
> t(haz) you'd get numbers displayed like the second version, where haz[1,] 
> converts that row to a vector.
> 
> Duncan Murdoch
> 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] round and trailing zero

2024-07-29 Thread Duncan Murdoch

On 2024-07-29 10:06 a.m., Göran Broström wrote:

I have a "result":

  > hazards
  (60, 70](70, 80](80, 90]   (90, 100]
[1,] 0.046612937 0.115643783 0.273613266 0.450127975

Two issues: (i) Too many decimals, and (ii) it seems to be an 1x4
matrix, I only need the first row. (i):

  > haz <- round(hazards, 3)
  > haz
   (60, 70] (70, 80] (80, 90] (90, 100]
[1,]0.0470.1160.274  0.45

As expected, the fourth element lost a trailing zero. I'll deal with
that, but first (ii):

  > haz[1, ]
   (60, 70]  (70, 80]  (80, 90] (90, 100]
  0.047 0.116 0.274 0.450

And the trailing zero is mysteriously recovered!

Is there some general rule governing this behaviour?


R uses the same format for every element in each column when printing a 
matrix or dataframe, and for every element in a vector.


Your first example had only one element per column.  If you had printed 
t(haz) you'd get numbers displayed like the second version, where 
haz[1,] converts that row to a vector.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] round and trailing zero

2024-07-29 Thread Göran Broström

I have a "result":

> hazards
(60, 70](70, 80](80, 90]   (90, 100]
[1,] 0.046612937 0.115643783 0.273613266 0.450127975

Two issues: (i) Too many decimals, and (ii) it seems to be an 1x4 
matrix, I only need the first row. (i):


> haz <- round(hazards, 3)
> haz
 (60, 70] (70, 80] (80, 90] (90, 100]
[1,]0.0470.1160.274  0.45

As expected, the fourth element lost a trailing zero. I'll deal with 
that, but first (ii):


> haz[1, ]
 (60, 70]  (70, 80]  (80, 90] (90, 100]
0.047 0.116 0.274 0.450

And the trailing zero is mysteriously recovered!

Is there some general rule governing this behaviour?

Thanks, Göran

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] C API - no NULL pointer guarantee?

2024-07-29 Thread Erez Shomron
Hello,

I'm working on bindings for the API (for zig), and was wondering if the R's C 
API guarantees it won't return null pointers?
The only reference I found in the "Writing R Extensions" manual where this not 
the case is `R_tryEval` and `R_tryEvalSilent`.
Otherwise it's unclear.

The reason I care about this is syntax. Because I don't know whether SEXPs are 
NULL or not, then I wrap the SEXP in an optional type, and the burden is on the 
user to either check or assert every time you want to handle an optional SEXP.
It's not too bad and provides null safety, but can get excessive.

See an example from one of my test functions (question mark unwraps the 
optional. asserting it is not null):
```
export fn testAsScalarVector() Robject {
const results = rzig.vec.allocVector(.List, 23).?.protect();
defer rzig.gc.protect_stack.unprotectAll();

results.?.setListObj(0, rzig.vec.asScalarVector(@as(f32, 1.32456e+32)));
results.?.setListObj(1, rzig.vec.asScalarVector(@as(f32, -9.87123e-32)));
results.?.setListObj(2, rzig.vec.asScalarVector(math.inf(f32)));
results.?.setListObj(3, rzig.vec.asScalarVector(-math.inf(f32)));
results.?.setListObj(4, rzig.vec.asScalarVector(math.nan(f32)));

results.?.setListObj(5, rzig.vec.asScalarVector(@as(f64, -9.1e+300)));
results.?.setListObj(6, rzig.vec.asScalarVector(@as(f64, 1.2e-300)));
results.?.setListObj(7, rzig.vec.asScalarVector(math.inf(f64)));
results.?.setListObj(8, rzig.vec.asScalarVector(-math.inf(f64)));
results.?.setListObj(9, rzig.vec.asScalarVector(math.nan(f64)));

results.?.setListObj(10, rzig.vec.asScalarVector(-9.1e+307));
results.?.setListObj(11, rzig.vec.asScalarVector(1.2e-307));
results.?.setListObj(12, rzig.vec.asScalarVector(1.0e+500)); // Inf
results.?.setListObj(13, rzig.vec.asScalarVector(-1.0e+500)); // -Inf

results.?.setListObj(14, rzig.vec.asScalarVector(5));
results.?.setListObj(15, rzig.vec.asScalarVector(-5));
results.?.setListObj(16, rzig.vec.asScalarVector(@as(u32, 4)));
results.?.setListObj(17, rzig.vec.asScalarVector(@as(i32, -4)));
results.?.setListObj(18, rzig.vec.asScalarVector(@as(u0, 0)));
results.?.setListObj(19, rzig.vec.asScalarVector(@as(u150, 2_000_000_000)));
results.?.setListObj(20, rzig.vec.asScalarVector(@as(i150, 
-2_000_000_000)));
results.?.setListObj(21, rzig.vec.asScalarVector(true));
results.?.setListObj(22, rzig.vec.asScalarVector(false));

return results;
}
```

It would be nice to be able to drop the question mark, but only if R guarantees 
null safety at the API level. Then I would declare optional return types only 
for documented cases like `R_tryEval`

Appreciate your time an help!
Thanks,
- Erez

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using optim() function to find MLE

2024-07-29 Thread Ivan Krylov via R-help
В Mon, 29 Jul 2024 09:52:22 +0530
Christofer Bogaso  пишет:

> LL = function(b0, b1)

help(optim) documents that the function to be optimised takes a single
argument, a vector containing the parameters. Here's how your LL
function can be adapted to this interface:

LL <- function(par) {
 b0 <- par[1]
 b1 <- par[2]
 sum(apply(as.matrix(dat[, c('PurchasedProb', 'Age')]), 1,
 function(iROw) iROw['PurchasedProb'] * log( 1 / (1 + exp(-1 * (b0 + b1
 * iROw['Age'] + (1 - iROw['PurchasedProb']) * log(1 - 1 / (1 +
 exp(-1 * (b0 + b1 * iROw['Age']))
}

Furethermore, LL(c(0, 1)) results in -Inf. All the methods supported by
optim() require at least the initial parameters to result in finite
values, and L-BFGS-B requires all evaluations to be finite. You're also
maximising the function, and optim() defaults to minimisation, so you
need an additional parameter to adjust that (or rewrite the LL function
further):

result1 <- optim(
 par = c(0, 0), fn = LL, method = "L-BFGS-B",
 control = list(fnscale = -1)
)

> coef(result1)

help(optim) documents the return value of optim() as not having a
class or a $coefficients field. You can use result1$par to access the
parameters.

> Is there any way to force optim() function to use Newton-CG algorithm?

I'm assuming you mean the method documented in
.
optim() doesn't support the truncated (line search) Newton-CG method.
See the 'optimx' and 'nloptr' packages for an implementation of a
truncated Newton method (not necessarily exactly the same one).

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using optim() function to find MLE

2024-07-28 Thread Christofer Bogaso
Hi,

I am trying to fit a GLM on below data. While R does provide direct
estimation, I wanted to go with manual calculation as below

dat = structure(list(PurchasedProb = c(0.37212389963679, 0.572853363351896,
0.908207789994776, 0.201681931037456, 0.898389684967697, 0.944675268605351,
0.660797792486846, 0.62911404389888, 0.0617862704675645, 0.205974574899301,
0.176556752528995, 0.687022846657783, 0.384103718213737, 0.769841419998556,
0.497699242085218, 0.717618508264422, 0.991906094830483, 0.380035179434344,
0.777445221319795, 0.934705231105909, 0.212142521282658, 0.651673766085878,
0.12095961317, 0.267220668727532, 0.386114092543721, 0.0133903331588954,
0.382387957070023, 0.86969084572047, 0.34034899668768, 0.482080115471035,
0.599565825425088, 0.493541307048872, 0.186217601411045, 0.827373318606988,
0.668466738192365, 0.79423986072652, 0.107943625887856, 0.723710946040228,
0.411274429643527, 0.820946294115856, 0.647060193819925, 0.78293276228942,
0.553036311641335, 0.529719580197707, 0.789356231689453, 0.023331202333793,
0.477230065036565, 0.7323137386702, 0.692731556482613, 0.477619622135535,
0.8612094768323, 0.438097107224166, 0.244797277031466, 0.0706790471449494,
0.0994661601725966, 0.31627170718275, 0.518634263193235, 0.6620050764177,
0.406830187188461, 0.912875924259424, 0.293603372760117, 0.459065726259723,
0.332394674187526, 0.65087046707049, 0.258016780717298, 0.478545248275623,
0.766310670645908, 0.0842469143681228, 0.875321330036968, 0.339072937844321,
0.839440350187942, 0.34668348915875, 0.333774930797517, 0.476351245073602,
0.892198335845023, 0.864339470630512, 0.389989543473348, 0.777320698834956,
0.960617997217923, 0.434659484773874, 0.712514678714797, 0.34368897751,
0.325352151878178, 0.757087148027495, 0.202692255144939, 0.711121222469956,
0.121691921027377, 0.245488513959572, 0.14330437942408, 0.239629415096715,
0.0589343772735447, 0.642288258532062, 0.876269212691113, 0.778914677444845,
0.79730882588774, 0.455274453619495, 0.410084082046524, 0.810870242770761,
0.604933290276676, 0.654723928077146, 0.353197271935642, 0.270260145887733,
0.99268406117335, 0.633493264438584, 0.213208135217428, 0.129372348077595,
0.478118034312502, 0.924074469832703, 0.59876096714288, 0.976170694921166,
0.731792511884123, 0.356726912083104, 0.431473690550774, 0.148211560677737,
0.0130775754805654, 0.715566066093743, 0.103184235747904, 0.446284348610789,
0.640101045137271, 0.991838620044291, 0.495593577856198, 0.484349524369463,
0.173442334868014, 0.754820944508538, 0.453895489219576, 0.511169783771038,
0.207545113284141, 0.228658142732456, 0.595711996313184, 0.57487219828181,
0.0770643802825361, 0.0355405795853585, 0.642795492196456, 0.928615199634805,
0.598092422354966, 0.560900748008862, 0.526027723914012, 0.985095223877579,
0.507641822332516, 0.682788078673184, 0.601541217649356, 0.238868677755818,
0.258165926672518, 0.729309623362496, 0.452570831403136, 0.175126768415794,
0.746698269620538, 0.104987640399486, 0.864544949028641, 0.614644971676171,
0.557159538846463, 0.328777319053188, 0.453131445450708, 0.500440972624347,
0.180866361130029, 0.529630602803081, 0.0752757457084954, 0.277755932649598,
0.212699519237503, 0.284790480975062, 0.895094102947041, 0.4462353233248,
0.779984889784828, 0.880619034869596, 0.413124209502712, 0.0638084805104882,
0.335487491684034, 0.723725946620107, 0.337615333497524, 0.630414122482762,
0.840614554006606, 0.856131664710119, 0.39135928102769, 0.380493885604665,
0.895445425994694, 0.644315762910992, 0.741078648716211, 0.605303446529433,
0.903081611497328, 0.293730155099183, 0.19126010988839, 0.886450943304226,
0.503339485730976, 0.877057543024421, 0.189193622441962, 0.758103052387014,
0.724498892668635, 0.943724818294868, 0.547646587016061, 0.711743867723271,
0.388905099825934, 0.100873126182705, 0.927302088588476, 0.283232500310987,
0.59057315881364, 0.110360604943708, 0.840507032116875, 0.317963684443384,
0.782851336989552, 0.267508207354695, 0.218645284883678, 0.516796836396679,
0.268950592027977, 0.181168327340856, 0.518576137488708, 0.562782935798168,
0.129156854469329, 0.256367604015395, 0.717935275984928, 0.961409936426207,
0.100140846567228, 0.763222689507529, 0.947966354666278, 0.818634688388556,
0.308292330708355, 0.649579460499808, 0.953355451114476, 0.953732650028542,
0.339979203417897, 0.262474110117182, 0.165453933179379, 0.322168056620285,
0.510125206550583, 0.923968471353874, 0.510959698352963, 0.257621260825545,
0.0464608869515359, 0.41785625834018, 0.854001502273604, 0.347230677725747,
0.131442320533097, 0.374486864544451, 0.631420228397474, 0.390078933676705,
0.689627848798409, 0.689413412474096, 0.554900623159483, 0.429624407785013,
0.452720062807202, 0.306443258887157, 0.578353944001719, 0.910370304249227,
0.142604082124308, 0.415047625312582, 0.210925750667229, 0.428750370861962,
0.132689975202084, 0.460096445865929, 0.942957059247419, 0.761973861604929,
0.932909828843549, 0.470678497571498, 0.603588067693636, 0.484989680582657,

[ESS] LSP Help for Downloaded Libraries in R

2024-07-28 Thread ThePsychoBuck via ESS-help
Hello Everyone,

I am a new spacemacs user and don't have much experience with emacs. I recently 
added R support to it according to spacemacs docs 
(https://develop.spacemacs.org/layers/+lang/ess/README.html). It is working 
wonderfully, with just a small problem. The LSP isn't showing help, 
autocompletion, and other things for packages which aren't in the core package. 
I tried adding

```
(setq ess-r-package-library-paths "path")
```
to user-config in .spacemacs but it didn't helped.

Can you please help me.

Sent from Proton Mail Android

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


Re: [R] please help generate a square correlation matrix

2024-07-28 Thread Richard O'Keefe
It sounds as though you have null hypothesis "x records independent
Bernoulli trials with the same (unknown) success probability p_x, y
records independent Bernoulli trials with the same (unknown) success
probability p_y, x and y are independent" and alternative hypothesis
"x and y succeed less often than they would under the null
hypothesis".  The obvious way to do that is to fit \hat{p_x} =
sum(x)/length(x), \hat{p_y| = sum(y)/length(y), and then compute the
lower tail Pr(number of times x and y succeed <= sum(xy) | p_x =
\hat{p_x} \& p_y = \hat{p_y}, and unless I am completely off my head
with sleepiness, this is just

pbinom(sum(x*y), length(x), mean(x)*mean(y))

So I don't quite see why you wanted correlations.

Since you say that "WE measured ..." the various Signor-Lipps-like
scenarios I was thinking of probably don't apply.  There are other
threats to validity:
- the presence of two (or more) mutations may be hard for your
equipment to detect
- patients with multiple mutations may die faster so may be less
likely to be captured for your study
- cell division rates decrease with age
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6789572/ so mutations
whose likelihood  depends on *rate* might tend to occur earlier in
life while mutations that depend on *accumulated* error might tend to
occur later in life, so "x occurs SOME time in a patient's life" and
"y occurs SOME time in a patient's life" might be independent while "x
and y occur at the SAME time in a patient's life" might be unlikely.
It would be interesting to check whether the frequency of each
mutation is independent of patient age, because you might want to
stratify the pbinom test by age in that case.  Exposure to
environmental mutagens is also likely to vary with age.

Looking at supermarket data in the past primed me to expect rates to
vary with age.  Sunscreen and cough mixture are negatively associated
(:-).

On Sun, 28 Jul 2024 at 12:40, Yuan Chun Ding  wrote:
>
> HI Bert,
>
>
>
> Thank you for extra help!!
>
> Yes, exactly, your interpretation is perfectly correct and your R code is 
> what I should look for.
>
> after generated all those negative values of correlation,
>
> I thought about the extremely small p values associated with those negative 
> correlation, which is not meaningful as I truncated my data.
>
>
>
> When examining the exclusiveness of mutation pairs, what I first thought 
> about is correlation, so stepped into a more complicated correlation journey.
>
> However, what Richard share is very helpful to explain why I got negative 
> correlation values for all pairs.
>
> In my case, we measured all mutations for all 1000 samples using an exactly 
> same sequencing method, so no issue of never-reporting.
>
> I am  very grateful for help and comments from Rui, Richard and Bert!!
>
>
>
> Ding
>
>
>
>
>
>
>
> From: Bert Gunter 
> Sent: Saturday, July 27, 2024 4:50 PM
> To: Yuan Chun Ding 
> Cc: Richard O'Keefe ; r-help@r-project.org
> Subject: Re: [R] please help generate a square correlation matrix
>
>
>
> Your expanded explanation helps clarify your intent. Herewith some comments. 
> Of course, feel free to ignore and not respond. And, as always, my apologies 
> if I have failed to comprehend your intent. 1. I would avoid any notion of 
> "statistical
>
> Your expanded explanation helps clarify your intent. Herewith some
>
> comments. Of course, feel free to ignore and not respond. And, as
>
> always, my apologies if I have failed to comprehend your intent.
>
>
>
> 1. I would avoid any notion of "statistical significance" like the
>
> plague. This is a purely exploratory exercise.
>
>
>
> 2. My understanding is that you want to know the proportion of rows in
>
> a pair of columns/vectors in which only 1 values of the pair is 1 out
>
> of the number of pairs where 1 or 2 values is 1.  In R syntax, this is
>
> simply:
>
>
>
> sum(xor(x, y)) / sum(x | y)  ,
>
> where x and y are two columns of 1's and 0's
>
>
>
> Better yet might be to report both this *and* sum(x|y) to help you
>
> judge "meaningfulness".
>
> Here is a simple function that does this
>
>
>
> ## first, define a function that does above calculation:
>
> assoc <- \(z){
>
>x <- z[,1]; y <- z[,2]
>
>n <- sum(x|y)
>
>c(prop = sum(xor(x, y))/n, N = n)
>
> }
>
>
>
> ## Now a function that uses it for the various combinations:
>
>
>
> somecor <- function(dat, func = assoc){
>
>dat <- as.matrix(dat)
>
>indx <- seq_len(ncol(dat))
>
>rbind(w <- combn(indx,2),
>
>  combn(indx, 2, FUN = \(m)func(dat[,m]) )) |>
>
>  t()  |> round(digits =2) |>
>
>   'dimnames<-'(list(rep.int('',ncol(w)), c("","", "prop","N")))
>
> }
>
>
>
> # Now apply it to your example data:
>
>
>
> somecor(dat)
>
> ## which gives
>
>  prop N
>
>  1 2 0.67 6
>
>  1 3 0.60 5
>
>  1 4 0.57 7
>
>  2 3 0.60 5
>
>  2 4 0.33 6
>
>  3 4 0.71 7
>
>
>
> This seems more interpretable and directly useful to me. Bigger values
>
> of prop for bigger N are the more interesting, 

Re: [R] help on date objects...

2024-07-28 Thread Rui Barradas

Às 05:23 de 28/07/2024, akshay kulkarni escreveu:

Dear members,
  WHy is the following code returning NA instead of 
the date?



as.Date("2022-01-02", origin = "1900-01-01",  format = "%y%d%m")

[1] NA


Thanking you,
Yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

There are several reasons for your result.

1. You have 4 digits year but format %y (lower case = 2 digits year) It 
should be %Y

2. Your date has '-' as separator but your format doesn't have a separator.

Also, though less important:

1. You don't need argument origin. This is only needed with numeric to 
date coercion.

2. Are you sure the format is -DD-MM, year-day-month?


as.Date("2022-01-02", format = "%Y-%d-%m")
#> [1] "2022-02-01"

# note the origin is not your posted origin date,
# see the examples on Windows and Excel
# dates in help("as.Date")
as.Date(19024, origin = "1970-01-01")
#> [1] "2022-02-01"


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ts_regular....in tsbox

2024-07-28 Thread Eric Berger
Did you try converting the date column to class Date? Does that work?


On Sun, Jul 28, 2024 at 7:40 AM akshay kulkarni 
wrote:

> dear members,
> I have a data frame which contains, among
> others, a date object of monthly frequency which is not regular, i.e some
> months are omitted, and the main variable to be forecast, among others. Its
> name is vesselB.
>
> I did the following code:
>
> vesselBR <- ts_regular(vesselB)
>
> but the missing months are not filled with NA. What should I do to insert
> NAs into the missing months? THe date column is of character class; should
> I make it Date class? Any other trick?
>
> THanking you,
> Yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] help on date objects...

2024-07-28 Thread Eric Berger
as.Date("2022-01-02", origin="1900-01-01", format="%Y-%d-%m")

On Sun, Jul 28, 2024 at 7:24 AM akshay kulkarni 
wrote:

> Dear members,
>  WHy is the following code returning NA
> instead of the date?
>
>
> > as.Date("2022-01-02", origin = "1900-01-01",  format = "%y%d%m")
> [1] NA
>
>
> Thanking you,
> Yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ts_regular....in tsbox

2024-07-27 Thread akshay kulkarni
dear members,
I have a data frame which contains, among others, a 
date object of monthly frequency which is not regular, i.e some months are 
omitted, and the main variable to be forecast, among others. Its name is 
vesselB.

I did the following code:

vesselBR <- ts_regular(vesselB)

but the missing months are not filled with NA. What should I do to insert NAs 
into the missing months? THe date column is of character class; should I make 
it Date class? Any other trick?

THanking you,
Yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help on date objects...

2024-07-27 Thread akshay kulkarni
Dear members,
 WHy is the following code returning NA instead of 
the date?


> as.Date("2022-01-02", origin = "1900-01-01",  format = "%y%d%m")
[1] NA


Thanking you,
Yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] please help generate a square correlation matrix

2024-07-27 Thread Yuan Chun Ding via R-help
HI Bert,

Thank you for extra help!!
Yes, exactly, your interpretation is perfectly correct and your R code is what 
I should look for.
after generated all those negative values of correlation,
I thought about the extremely small p values associated with those negative 
correlation, which is not meaningful as I truncated my data.

When examining the exclusiveness of mutation pairs, what I first thought about 
is correlation, so stepped into a more complicated correlation journey.
However, what Richard share is very helpful to explain why I got negative 
correlation values for all pairs.
In my case, we measured all mutations for all 1000 samples using an exactly 
same sequencing method, so no issue of never-reporting.
I am  very grateful for help and comments from Rui, Richard and Bert!!

Ding



From: Bert Gunter 
Sent: Saturday, July 27, 2024 4:50 PM
To: Yuan Chun Ding 
Cc: Richard O'Keefe ; r-help@r-project.org
Subject: Re: [R] please help generate a square correlation matrix

Your expanded explanation helps clarify your intent. Herewith some comments. Of 
course, feel free to ignore and not respond. And, as always, my apologies if I 
have failed to comprehend your intent. 1. I would avoid any notion of 
"statistical


Your expanded explanation helps clarify your intent. Herewith some

comments. Of course, feel free to ignore and not respond. And, as

always, my apologies if I have failed to comprehend your intent.



1. I would avoid any notion of "statistical significance" like the

plague. This is a purely exploratory exercise.



2. My understanding is that you want to know the proportion of rows in

a pair of columns/vectors in which only 1 values of the pair is 1 out

of the number of pairs where 1 or 2 values is 1.  In R syntax, this is

simply:



sum(xor(x, y)) / sum(x | y)  ,

where x and y are two columns of 1's and 0's



Better yet might be to report both this *and* sum(x|y) to help you

judge "meaningfulness".

Here is a simple function that does this



## first, define a function that does above calculation:

assoc <- \(z){

   x <- z[,1]; y <- z[,2]

   n <- sum(x|y)

   c(prop = sum(xor(x, y))/n, N = n)

}



## Now a function that uses it for the various combinations:



somecor <- function(dat, func = assoc){

   dat <- as.matrix(dat)

   indx <- seq_len(ncol(dat))

   rbind(w <- combn(indx,2),

 combn(indx, 2, FUN = \(m)func(dat[,m]) )) |>

 t()  |> round(digits =2) |>

  'dimnames<-'(list(rep.int('',ncol(w)), c("","", "prop","N")))

}



# Now apply it to your example data:



somecor(dat)

## which gives

 prop N

 1 2 0.67 6

 1 3 0.60 5

 1 4 0.57 7

 2 3 0.60 5

 2 4 0.33 6

 3 4 0.71 7



This seems more interpretable and directly useful to me. Bigger values

of prop for bigger N are the more interesting, assuming I have

interpreted you correctly.



Cheers,

Bert





On Sat, Jul 27, 2024 at 12:54 PM Yuan Chun Ding 
mailto:ycd...@coh.org>> wrote:

>

> Hi Richard,

>

>

>

> Nice to know you had similar experience.

>

> Yes, your understanding is right.  all correlations are negative after 
> removing double-zero rows.

>

> It is consistent with a heatmap we generated.

>

> 1 is for a cancer patient with a specific mutation.  0 is no mutation for the 
> same mutation type in a patient.

>

> a pair of mutation type (two different mutations) are exclusive for most of 
> patients in heatmap or oncoplots.

>

>  If we include all 1000 patients, 900 of patients with no mutations in both 
> mutation types, then the correlation is not significant at all.

>

> But eyeball the heatmap (oncoplots) for mutation (row) by patient (column), 
> mutations are exclusive for most of patients,

>

> so I want to measure how strong the exclusiveness between two specific 
> mutation types across those patients with at least one mutation type.

>

> Then put the pair of mutations with strong negative mutations on the top rows 
> by order of negative mutation values.

>

>

>

> Regarding a final application,  maybe there are some usage for my case.

>

>  If one develops two drugs specific to the two negative correlated mutations, 
> the drug treatment for cancer patients is usually only for those patients 
> carrying the specific mutation,

>

> then it is informative to know how strong the negative correlation when 
> considering different combination of treatment strategies.

>

>

>

> Ding

>

>

>

>

>

>

>

>

>

>

>

> From: R-help 
> mailto:r-help-boun...@r-project.org>> On Behalf 
> Of Richard O'Keefe

> Sent: Saturday, July 27, 2024 4:47 AM

> To: Bert Gunter mailto:bgunter.4...@gmail.com>>

> Cc: r-help@r-project.org

> Subject: Re: [R] please help generate a square correlation matrix

>

>

>

> Curses, my laptop is hallucinating again. Hope I can get through this. So 
> we're talking about correlations between binary variables. Suppose we have 
> two 0-1-valued variables, x and y. Let A <- sum(x*y) # number of cases where 
> x 

Re: [R] please help generate a square correlation matrix

2024-07-27 Thread Bert Gunter
Your expanded explanation helps clarify your intent. Herewith some
comments. Of course, feel free to ignore and not respond. And, as
always, my apologies if I have failed to comprehend your intent.

1. I would avoid any notion of "statistical significance" like the
plague. This is a purely exploratory exercise.

2. My understanding is that you want to know the proportion of rows in
a pair of columns/vectors in which only 1 values of the pair is 1 out
of the number of pairs where 1 or 2 values is 1.  In R syntax, this is
simply:

sum(xor(x, y)) / sum(x | y)  ,
where x and y are two columns of 1's and 0's

Better yet might be to report both this *and* sum(x|y) to help you
judge "meaningfulness".
Here is a simple function that does this

## first, define a function that does above calculation:
assoc <- \(z){
   x <- z[,1]; y <- z[,2]
   n <- sum(x|y)
   c(prop = sum(xor(x, y))/n, N = n)
}

## Now a function that uses it for the various combinations:

somecor <- function(dat, func = assoc){
   dat <- as.matrix(dat)
   indx <- seq_len(ncol(dat))
   rbind(w <- combn(indx,2),
 combn(indx, 2, FUN = \(m)func(dat[,m]) )) |>
 t()  |> round(digits =2) |>
  'dimnames<-'(list(rep.int('',ncol(w)), c("","", "prop","N")))
}

# Now apply it to your example data:

somecor(dat)
## which gives
 prop N
 1 2 0.67 6
 1 3 0.60 5
 1 4 0.57 7
 2 3 0.60 5
 2 4 0.33 6
 3 4 0.71 7

This seems more interpretable and directly useful to me. Bigger values
of prop for bigger N are the more interesting, assuming I have
interpreted you correctly.

Cheers,
Bert


On Sat, Jul 27, 2024 at 12:54 PM Yuan Chun Ding  wrote:
>
> Hi Richard,
>
>
>
> Nice to know you had similar experience.
>
> Yes, your understanding is right.  all correlations are negative after 
> removing double-zero rows.
>
> It is consistent with a heatmap we generated.
>
> 1 is for a cancer patient with a specific mutation.  0 is no mutation for the 
> same mutation type in a patient.
>
> a pair of mutation type (two different mutations) are exclusive for most of 
> patients in heatmap or oncoplots.
>
>  If we include all 1000 patients, 900 of patients with no mutations in both 
> mutation types, then the correlation is not significant at all.
>
> But eyeball the heatmap (oncoplots) for mutation (row) by patient (column), 
> mutations are exclusive for most of patients,
>
> so I want to measure how strong the exclusiveness between two specific 
> mutation types across those patients with at least one mutation type.
>
> Then put the pair of mutations with strong negative mutations on the top rows 
> by order of negative mutation values.
>
>
>
> Regarding a final application,  maybe there are some usage for my case.
>
>  If one develops two drugs specific to the two negative correlated mutations, 
> the drug treatment for cancer patients is usually only for those patients 
> carrying the specific mutation,
>
> then it is informative to know how strong the negative correlation when 
> considering different combination of treatment strategies.
>
>
>
> Ding
>
>
>
>
>
>
>
>
>
>
>
> From: R-help  On Behalf Of Richard O'Keefe
> Sent: Saturday, July 27, 2024 4:47 AM
> To: Bert Gunter 
> Cc: r-help@r-project.org
> Subject: Re: [R] please help generate a square correlation matrix
>
>
>
> Curses, my laptop is hallucinating again. Hope I can get through this. So 
> we're talking about correlations between binary variables. Suppose we have 
> two 0-1-valued variables, x and y. Let A <- sum(x*y) # number of cases where 
> x and y are
>
> Curses, my laptop is hallucinating again.  Hope I can get through this.
>
> So we're talking about correlations between binary variables.
>
> Suppose we have two 0-1-valued variables, x and y.
>
> Let A <- sum(x*y)  # number of cases where x and y are both 1.
>
> Let B <- sum(x)-A  # number of cases where x is 1 and y is 0
>
> Let C <- sum(y)-A # number of cases where y is 1 and x is 0
>
> Let D <- sum(!x * !y) # number of cases where x and y are both 0.
>
> (also D = length(x)-A-B-C)
>
>
>
> All the information is summarised in the 2-by-2 contingency table.
>
> Some years ago, Nathan Rountree and I supervised Yung-Sing Koh's
>
> data-mining PhD.
>
> She surveyed the data mining literature and found some 37 different
>
> "interestingness measures" for two-variable associations  -- if I
>
> remember correctly; there were a lot of them.  They fell into a much
>
> smaller number of qualitatively similar groups.
>
> At any rate, the Pearson correlation between x and y is
>
> (A*D - B*C)/sqrt((A+B)*(C+D)*(A+C)*(B+D))
>
>
>
> So what happens when we delete the rows where x = 0 and y = 0?
>
> Right, it forces D to 0, leaving A B C unchanged.
>
> And looking at the numerator,
>
>   If you delete rows with x = 0 y = 0 you MUST get a negative correlation.
>
>
>
> Quite a modest "true" correlation (based on all the data) like -0.2
>
> can masquerade as quite a strong "zero-suppressed" correlation like
>
> -0.6.  Even +0.2 can turn into -0.4.   (These 

Re: [R] please help generate a square correlation matrix

2024-07-27 Thread Yuan Chun Ding via R-help
Hi Richard,

Nice to know you had similar experience.
Yes, your understanding is right.  all correlations are negative after removing 
double-zero rows.
It is consistent with a heatmap we generated.
1 is for a cancer patient with a specific mutation.  0 is no mutation for the 
same mutation type in a patient.
a pair of mutation type (two different mutations) are exclusive for most of 
patients in heatmap or oncoplots.
 If we include all 1000 patients, 900 of patients with no mutations in both 
mutation types, then the correlation is not significant at all.
But eyeball the heatmap (oncoplots) for mutation (row) by patient (column), 
mutations are exclusive for most of patients,
so I want to measure how strong the exclusiveness between two specific mutation 
types across those patients with at least one mutation type.
Then put the pair of mutations with strong negative mutations on the top rows 
by order of negative mutation values.

Regarding a final application,  maybe there are some usage for my case.
 If one develops two drugs specific to the two negative correlated mutations, 
the drug treatment for cancer patients is usually only for those patients 
carrying the specific mutation,
then it is informative to know how strong the negative correlation when 
considering different combination of treatment strategies.

Ding





From: R-help  On Behalf Of Richard O'Keefe
Sent: Saturday, July 27, 2024 4:47 AM
To: Bert Gunter 
Cc: r-help@r-project.org
Subject: Re: [R] please help generate a square correlation matrix

Curses, my laptop is hallucinating again. Hope I can get through this. So we're 
talking about correlations between binary variables. Suppose we have two 
0-1-valued variables, x and y. Let A <- sum(x*y) # number of cases where x and 
y are


Curses, my laptop is hallucinating again.  Hope I can get through this.

So we're talking about correlations between binary variables.

Suppose we have two 0-1-valued variables, x and y.

Let A <- sum(x*y)  # number of cases where x and y are both 1.

Let B <- sum(x)-A  # number of cases where x is 1 and y is 0

Let C <- sum(y)-A # number of cases where y is 1 and x is 0

Let D <- sum(!x * !y) # number of cases where x and y are both 0.

(also D = length(x)-A-B-C)



All the information is summarised in the 2-by-2 contingency table.

Some years ago, Nathan Rountree and I supervised Yung-Sing Koh's

data-mining PhD.

She surveyed the data mining literature and found some 37 different

"interestingness measures" for two-variable associations  -- if I

remember correctly; there were a lot of them.  They fell into a much

smaller number of qualitatively similar groups.

At any rate, the Pearson correlation between x and y is

(A*D - B*C)/sqrt((A+B)*(C+D)*(A+C)*(B+D))



So what happens when we delete the rows where x = 0 and y = 0?

Right, it forces D to 0, leaving A B C unchanged.

And looking at the numerator,

  If you delete rows with x = 0 y = 0 you MUST get a negative correlation.



Quite a modest "true" correlation (based on all the data) like -0.2

can masquerade as quite a strong "zero-suppressed" correlation like

-0.6.  Even +0.2 can turn into -0.4.   (These figures are from a

particular simulation run and may not apply in your case.)



Now one of the reasons why Yun-Sing Koh, Nathan Rountree, and I were

interested in interestingness measures is perhaps coincidentally

related to the file drawer/underreporting problem: it's quite common

for rows where x = 0 and y = 0 never to have been reported to you, so

we were hoping there were measures immune to that.  I have argued for

years that "till record analysis" for supermarkets  is badly flawed

by two facts: (a) it is hard to measure how much of a product people

WOULD have bought if only you had offered it for sale (although you

can make educated guesses) and (b) till records provide no evidence on

what the people who walked out without buying anything wanted (was the

price too high?  could they not find it?).  Problem (a) leads to a

commercial variant of the Signor-Lipps effect: "when x and/or y were

available for purchase" is not the same as "the period for which data

were recorded", thus inflating D, perhaps massively.  Methods

developed for handling the Signor-Lipps effect in paleontology can be

used to estimate when x and y were available helping you to recover a

more realistic N=A+B+C+D.  I really should have published that.



All of which is a long-winded way of saying that

- Pearson correlations on binary columns can be computed very efficiently

- the rows with x=0 and y=0 may be very informative, even essential for analysis

- delete them at your peril.

- really, delete them at your peril.



On Sat, 27 Jul 2024 at 23:07, Richard O'Keefe 
mailto:rao...@gmail.com>> wrote:

>

> Let's go back to the original posting.

>

> > >

> > >> in each column, less than 10% values are 1, most of them are 0;

> > >

> > >

> > >

> > >> so I want to remove a  row with value of zero in both 

Re: [R] plotting nnet function....

2024-07-27 Thread Ivan Krylov via R-help
В Sat, 27 Jul 2024 11:00:34 +
akshay kulkarni  пишет:

> My question is : how to plot the final model on the actual data
> points?

Have you been able to obtain the predictions? What happens if you call
predict() on the model object returned to you by train()?

Once you have both the data and the prediction, it should be as simple
as plot(traindata$predictor_column, traindata$regressor_column);
lines(traindata$predictor_column, previously_returned_predictions). (Or
an equivalent with your favourite plotting system for R.)

Try following the vignette from the 'caret' package:
https://cran.r-project.org/package=caret/vignettes/caret.html

If you do encounter an error on your way or get stuck not knowing how
exactly to continue, please ask a more specific question.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic Knot selection in Piecewise linear splines

2024-07-27 Thread Anupam Tyagi
Thanks!

For some reason I am getting an error when I run your code with my
variables. It works fine with Martin's x and y variables.
So far as I know variable lengths are equal.
> o <-selgmented(lnCpc, ~lnGdpc, Kmax=20, type="bic", msg=TRUE)
Error in model.frame.default(formula = y ~ x, drop.unused.levels = TRUE) :
variable lengths differ (found for 'x')
> length(lnCpc)
[1] 2726
> length(lnGdpc)
[1] 2726

On Fri, 26 Jul 2024 at 20:16, Vito Muggeo  wrote:

> dear all,
> I apologize for my delay in replying you. Here my contribution, maybe
> just for completeness:
>
> Similar to "earth", "segmented" also fits piecewise linear relationships
> with the number of breakpoints being selected by the AIC or BIC
> (recommended).
>
> #code (example and code from Martin Maechler previous email)
>
> library(segmented)
> o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE)
> plot(o, add=TRUE)
> lines(o, col=2) #the approx CI for the breakpoints
>
> confint(o) #the estimated breakpoints (with CI's)
> slope(o) #the estimated slopes (with CI's)
>
>
> However segmented appears to be less efficient than earth (although with
> reasonable running times), it does NOT work with multivariate responses
> neither products between piecewise linear terms.
>
> kind regards,
> Vito
>
>
>
> Il 16/07/2024 11:22, Martin Maechler ha scritto:
> >> Anupam Tyagi
> >>  on Tue, 9 Jul 2024 16:16:43 +0530 writes:
> >
> >  > How can I do automatic knot selection while fitting piecewise
> linear
> >  > splines to two variables x and y? Which package to use to do it
> simply? I
> >  > also want to visualize the splines (and the scatter plot) with a
> graph.
> >
> >  > Anupam
> >
> > NB: linear splines, i.e. piecewise linear continuous functions.
> > Given the knots, use  approx() or approxfun() however, the
> > automatic knots selection does not happen in the base R packages.
> >
> > I'm sure there are several R packages doing this.
> > The best such package in my opinion is "earth" which does a
> > re-implementation (and extensive  *generalization*) of the
> > famous  MARS algorithm of Friedman.
> > ==>
> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
> >
> > Note that their strengths and power is that  they do their work
> > for multivariate x (MARS := Multivariate Adaptive Regression
> > Splines), but indeed do work for the simple 1D case.
> >
> > In the following example, we always get 11 final knots,
> > but I'm sure one can tweak the many tuning paramters of earth()
> > to get more:
> >
> > ## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via
> earth() {using MARS}!
> >
> > x <- (0:800)/8
> >
> > f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> > curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
> >
> > set.seed(11)
> > y <- f(x) + 10*rnorm(x)
> >
> > m.sspl <- smooth.spline(x,y) # base line "standard smoother"
> >
> > require(earth)
> > fm1 <- earth(x, y) # default settings
> > summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice")
> below
> > ## Call: earth(x=x, y=y)
> >
> > ## y =
> > ##   175.9612
> > ##   -   10.6744 * pmax(0,  x -  4.625)
> > ##   +  9.928496 * pmax(0,  x - 10.875)
> > ##   -  5.940857 * pmax(0,  x -  20.25)
> > ##   +  3.438948 * pmax(0,  x - 27.125)
> > ##   -  3.828159 * pmax(0, 44 -  x)
> > ##   +  4.207046 * pmax(0,  x - 44)
> > ##   +  2.573822 * pmax(0,  x -   76.5)
> > ##   -  10.99073 * pmax(0,  x - 87.125)
> > ##   +  10.97592 * pmax(0,  x - 90.875)
> > ##   +  9.331949 * pmax(0,  x - 94)
> > ##   -   8.48575 * pmax(0,  x -   96.5)
> >
> > ## Selected 12 of 12 terms, and 1 of 1 predictors
> > ## Termination condition: Reached nk 21
> > ## Importance: x
> > ## Number of terms at each degree of interaction: 1 11 (additive model)
> > ## GCV 108.6592RSS 82109.44GRSq 0.861423RSq 0.86894
> >
> >
> > fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> > summary(fm2)
> > all.equal(fm1, fm2)# they are identical (apart from 'call'):
> > fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
> forward pass; *no* pruning
> > ## still no change: fm3 "==" fm1
> > all.equal(predict(fm1, xx), predict(fm3, xx))
> >
> > ## BTW: The chosen knots and coefficients are
> > mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
> >
> > ## Plots : fine grid for visualization: instead of   xx <- seq(x[1],
> x[length(x)], length.out = 1024)
> > rnx <- extendrange(x) ## to extrapolate a bit
> > xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
> >
> > cbind(f = f(xx),
> >sspl = predict(m.sspl, xx)$y,
> >mars = predict(fm1, xx)) -> fits
> >
> > plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> > cols <- c(adjustcolor(2, 1/3),
> >adjustcolor(4, 2/3),
> >adjustcolor("orange4", 2/3))
> > lwds <- c(3, 2, 2)
> > matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> > 

Re: [R] please help generate a square correlation matrix

2024-07-27 Thread Richard O'Keefe
Curses, my laptop is hallucinating again.  Hope I can get through this.
So we're talking about correlations between binary variables.
Suppose we have two 0-1-valued variables, x and y.
Let A <- sum(x*y)  # number of cases where x and y are both 1.
Let B <- sum(x)-A  # number of cases where x is 1 and y is 0
Let C <- sum(y)-A # number of cases where y is 1 and x is 0
Let D <- sum(!x * !y) # number of cases where x and y are both 0.
(also D = length(x)-A-B-C)

All the information is summarised in the 2-by-2 contingency table.
Some years ago, Nathan Rountree and I supervised Yung-Sing Koh's
data-mining PhD.
She surveyed the data mining literature and found some 37 different
"interestingness measures" for two-variable associations  -- if I
remember correctly; there were a lot of them.  They fell into a much
smaller number of qualitatively similar groups.
At any rate, the Pearson correlation between x and y is
(A*D - B*C)/sqrt((A+B)*(C+D)*(A+C)*(B+D))

So what happens when we delete the rows where x = 0 and y = 0?
Right, it forces D to 0, leaving A B C unchanged.
And looking at the numerator,
  If you delete rows with x = 0 y = 0 you MUST get a negative correlation.

Quite a modest "true" correlation (based on all the data) like -0.2
can masquerade as quite a strong "zero-suppressed" correlation like
-0.6.  Even +0.2 can turn into -0.4.   (These figures are from a
particular simulation run and may not apply in your case.)

Now one of the reasons why Yun-Sing Koh, Nathan Rountree, and I were
interested in interestingness measures is perhaps coincidentally
related to the file drawer/underreporting problem: it's quite common
for rows where x = 0 and y = 0 never to have been reported to you, so
we were hoping there were measures immune to that.  I have argued for
years that "till record analysis" for supermarkets  is badly flawed
by two facts: (a) it is hard to measure how much of a product people
WOULD have bought if only you had offered it for sale (although you
can make educated guesses) and (b) till records provide no evidence on
what the people who walked out without buying anything wanted (was the
price too high?  could they not find it?).  Problem (a) leads to a
commercial variant of the Signor-Lipps effect: "when x and/or y were
available for purchase" is not the same as "the period for which data
were recorded", thus inflating D, perhaps massively.  Methods
developed for handling the Signor-Lipps effect in paleontology can be
used to estimate when x and y were available helping you to recover a
more realistic N=A+B+C+D.  I really should have published that.

All of which is a long-winded way of saying that
- Pearson correlations on binary columns can be computed very efficiently
- the rows with x=0 and y=0 may be very informative, even essential for analysis
- delete them at your peril.
- really, delete them at your peril.

On Sat, 27 Jul 2024 at 23:07, Richard O'Keefe  wrote:
>
> Let's go back to the original posting.
>
> > >
> > >> in each column, less than 10% values are 1, most of them are 0;
> > >
> > >
> > >
> > >> so I want to remove a  row with value of zero in both columns when 
> > >> calculate correlation between two columns.
> > >
>
> So we're talking about correlations between binary variables.
> Suppose we have two 0-1-valued variables, x and y.
> Let A <- sum(x*y)  # number of cases where x and y are both 1.
> Let B <- sum(x)-a  # number of cases where x is 1 and y is 0
> Let C <- sum(y)-a # number of cases where y is 1 and x is 0
> Let D <- sum(!x * !y) # number of cases where x and y are both 0.
>
> N
>
> On Fri, 26 Jul 2024 at 12:07, Bert Gunter  wrote:
> >
> > If I have understood the request, I'm not sure that omitting all 0
> > pairs for each pair of columns makes much sense, but be that as it
> > may, here's another way to do it by using the 'FUN' argument of combn
> > to encapsulate any calculations that you do. I just use cor() as the
> > calculation -- you can use anything you like that takes two vectors of
> > 0's and 1's and produces fixed length numeric results (or fromm which
> > you can extract such).
> >
> > I encapsulated it all in a little function. Note that I first
> > converted the data frame to a matrix. Because of their generality,
> > data frames carry a lot of extra baggage that can slow purely numeric
> > manipulations down.
> >
> > Anyway, here's the function, 'somecors' (I'm a bad name picker :(  ! )
> >
> >somecors <- function(dat, func = cor){
> >   dat <- as.matrix(dat)
> >   indx <- seq_len(ncol(dat))
> >  combn(indx, 2, FUN = \(z) {
> > i <- z[1]; j <- z[2]
> > k <- dat[, i ] | dat[, j ]
> > c(z,func(dat[k,i ], dat[k,j ]))
> >  })
> >}
> >
> > Results come out as a matrix with combn(ncol(dat),2) columns, the
> > first 2 rows giving the pair of column numbers for each column,and
> > then 1 or more rows (possibly extracted) from whatever func you use.
> > Here's the results for your data 

Re: [R] please help generate a square correlation matrix

2024-07-27 Thread Richard O'Keefe
Let's go back to the original posting.

> >
> >> in each column, less than 10% values are 1, most of them are 0;
> >
> >
> >
> >> so I want to remove a  row with value of zero in both columns when 
> >> calculate correlation between two columns.
> >

So we're talking about correlations between binary variables.
Suppose we have two 0-1-valued variables, x and y.
Let A <- sum(x*y)  # number of cases where x and y are both 1.
Let B <- sum(x)-a  # number of cases where x is 1 and y is 0
Let C <- sum(y)-a # number of cases where y is 1 and x is 0
Let D <- sum(!x * !y) # number of cases where x and y are both 0.

N

On Fri, 26 Jul 2024 at 12:07, Bert Gunter  wrote:
>
> If I have understood the request, I'm not sure that omitting all 0
> pairs for each pair of columns makes much sense, but be that as it
> may, here's another way to do it by using the 'FUN' argument of combn
> to encapsulate any calculations that you do. I just use cor() as the
> calculation -- you can use anything you like that takes two vectors of
> 0's and 1's and produces fixed length numeric results (or fromm which
> you can extract such).
>
> I encapsulated it all in a little function. Note that I first
> converted the data frame to a matrix. Because of their generality,
> data frames carry a lot of extra baggage that can slow purely numeric
> manipulations down.
>
> Anyway, here's the function, 'somecors' (I'm a bad name picker :(  ! )
>
>somecors <- function(dat, func = cor){
>   dat <- as.matrix(dat)
>   indx <- seq_len(ncol(dat))
>  combn(indx, 2, FUN = \(z) {
> i <- z[1]; j <- z[2]
> k <- dat[, i ] | dat[, j ]
> c(z,func(dat[k,i ], dat[k,j ]))
>  })
>}
>
> Results come out as a matrix with combn(ncol(dat),2) columns, the
> first 2 rows giving the pair of column numbers for each column,and
> then 1 or more rows (possibly extracted) from whatever func you use.
> Here's the results for your data formatted to 2 decimal places:
>
> > round(somecors(dat),2)
>  [,1]  [,2]  [,3]  [,4] [,5]  [,6]
> [1,]  1.0  1.00  1.00  2.002  3.00
> [2,]  2.0  3.00  4.00  3.004  4.00
> [3,] -0.5 -0.41 -0.35 -0.41   NA -0.47
> Warning message:
> In func(dat[k, i], dat[k, j]) : the standard deviation is zero
>
> The NA and warning comes in the 2,4 pair of columns because after
> removing all zero rows in the pair, dat[,4] is all 1's, giving a zero
> in the denominator of the cor() calculation -- again, assuming I have
> correctly understood your request. If so, this might be something you
> need to worry about.
>
> Again, feel free to ignore if  I have misinterpreterd or this does not suit.
>
> Cheers,
> Bert
>
>
> On Thu, Jul 25, 2024 at 2:01 PM Rui Barradas  wrote:
> >
> > Às 20:47 de 25/07/2024, Yuan Chun Ding escreveu:
> > > Hi Rui,
> > >
> > > You are always very helpful!! Thank you,
> > >
> > > I just modified your R codes to remove a row with zero values in both 
> > > column pair as below for my real data.
> > >
> > > Ding
> > >
> > > dat<-gene22mut.coded
> > > r <- P <- matrix(NA, nrow = 22L, ncol = 22L,
> > >   dimnames = list(names(dat), names(dat)))
> > >
> > > for(i in 1:22) {
> > >#i=1
> > >x <- dat[[i]]
> > >for(j in (1:22)) {
> > >  #j=2
> > >  if(i == j) {
> > ># there's nothing to test, assign correlation 1
> > >r[i, j] <- 1
> > >  } else {
> > >tmp <-cbind(x,dat[[j]])
> > >row0 <-rowSums(tmp)
> > >tem2 <-tmp[row0!=0,]
> > >tmp3 <- cor.test(tem2[,1],tem2[,2])
> > >r[i, j] <- tmp3$estimate
> > >P[i, j] <- tmp3$p.value
> > >  }
> > >}
> > > }
> > > r<-as.data.frame(r)
> > > P<-as.data.frame(P)
> > >
> > > From: R-help  On Behalf Of Yuan Chun Ding 
> > > via R-help
> > > Sent: Thursday, July 25, 2024 11:26 AM
> > > To: Rui Barradas ; r-help@r-project.org
> > > Subject: Re: [R] please help generate a square correlation matrix
> > >
> > > HI Rui, Thank you for the help! You did not remove a row if zero values 
> > > exist in both column pair, right? Ding From: Rui Barradas  > > sapo. pt> Sent: Thursday, July 25, 2024 11: 15 AM To: Yuan Chun Ding 
> > > ;
> > >
> > >
> > > HI Rui,
> > >
> > >
> > >
> > > Thank you for the  help!
> > >
> > >
> > >
> > > You did not remove a row if zero values exist in both column pair, right?
> > >
> > >
> > >
> > > Ding
> > >
> > >
> > >
> > > From: Rui Barradas mailto:ruipbarra...@sapo.pt>>
> > >
> > > Sent: Thursday, July 25, 2024 11:15 AM
> > >
> > > To: Yuan Chun Ding mailto:ycd...@coh.org>>; 
> > > r-help@r-project.org
> > >
> > > Subject: Re: [R] please help generate a square correlation matrix
> > >
> > >
> > >
> > > Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R 
> > > users, > > I generated a square correlation matrix for the dat dataframe 
> > > below; > dat<-data. frame(g1=c(1,0,0,1,1,1,0,0,0), > 
> > > g2=c(0,1,0,1,0,1,1,0,0), > g3=c(1,1,0,0,0,1,0,0,0),
> > >
> > >
> 

[R] correction.....

2024-07-27 Thread akshay kulkarni
dear members,
  I want to mention that I am using the neural 
network model in caret. I forgot to mention it in the previous mail to you 
people

THanking you,
Yours sincerely,
AKSHAY M KULKARNI

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] plotting nnet function....

2024-07-27 Thread akshay kulkarni
Dear members,
 I am using caret for modelling my data. It is a 
regression problem. My question is : how to plot the final model on the actual 
data points? The output of the model will be a nonlinear form of the activation 
function; I want to plot it on the data points. I have researched on web but to 
no effect. Like drawing a line on the original data points, for a linear model.

Thanking you,
Yours sincerely
AKSHAY M KULKARNI

[https://s-install.avcdn.net/ipm/preview/icons/icon-envelope-tick-round-orange-animated-no-repeat-v1.gif]
  
Virus-free.www.avast.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic Knot selection in Piecewise linear splines

2024-07-26 Thread Vito Muggeo via R-help

dear all,
I apologize for my delay in replying you. Here my contribution, maybe 
just for completeness:


Similar to "earth", "segmented" also fits piecewise linear relationships 
with the number of breakpoints being selected by the AIC or BIC 
(recommended).


#code (example and code from Martin Maechler previous email)

library(segmented)
o<-selgmented(y, ~x, Kmax=20, type="bic", msg=TRUE)
plot(o, add=TRUE)
lines(o, col=2) #the approx CI for the breakpoints

confint(o) #the estimated breakpoints (with CI's)
slope(o) #the estimated slopes (with CI's)


However segmented appears to be less efficient than earth (although with 
reasonable running times), it does NOT work with multivariate responses 
neither products between piecewise linear terms.


kind regards,
Vito



Il 16/07/2024 11:22, Martin Maechler ha scritto:

Anupam Tyagi
 on Tue, 9 Jul 2024 16:16:43 +0530 writes:


 > How can I do automatic knot selection while fitting piecewise linear
 > splines to two variables x and y? Which package to use to do it simply? I
 > also want to visualize the splines (and the scatter plot) with a graph.

 > Anupam

NB: linear splines, i.e. piecewise linear continuous functions.
Given the knots, use  approx() or approxfun() however, the
automatic knots selection does not happen in the base R packages.

I'm sure there are several R packages doing this.
The best such package in my opinion is "earth" which does a
re-implementation (and extensive  *generalization*) of the
famous  MARS algorithm of Friedman.
==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines

Note that their strengths and power is that  they do their work
for multivariate x (MARS := Multivariate Adaptive Regression
Splines), but indeed do work for the simple 1D case.

In the following example, we always get 11 final knots,
but I'm sure one can tweak the many tuning paramters of earth()
to get more:

## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via  earth() 
{using MARS}!

x <- (0:800)/8

f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
curve(f(x), 0, 100, n = 1000, col=2, lwd=2)

set.seed(11)
y <- f(x) + 10*rnorm(x)

m.sspl <- smooth.spline(x,y) # base line "standard smoother"

require(earth)
fm1 <- earth(x, y) # default settings
summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
## Call: earth(x=x, y=y)

## y =
##   175.9612
##   -   10.6744 * pmax(0,  x -  4.625)
##   +  9.928496 * pmax(0,  x - 10.875)
##   -  5.940857 * pmax(0,  x -  20.25)
##   +  3.438948 * pmax(0,  x - 27.125)
##   -  3.828159 * pmax(0, 44 -  x)
##   +  4.207046 * pmax(0,  x - 44)
##   +  2.573822 * pmax(0,  x -   76.5)
##   -  10.99073 * pmax(0,  x - 87.125)
##   +  10.97592 * pmax(0,  x - 90.875)
##   +  9.331949 * pmax(0,  x - 94)
##   -   8.48575 * pmax(0,  x -   96.5)

## Selected 12 of 12 terms, and 1 of 1 predictors
## Termination condition: Reached nk 21
## Importance: x
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 108.6592RSS 82109.44GRSq 0.861423RSq 0.86894


fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
summary(fm2)
all.equal(fm1, fm2)# they are identical (apart from 'call'):
fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward 
pass; *no* pruning
## still no change: fm3 "==" fm1
all.equal(predict(fm1, xx), predict(fm3, xx))

## BTW: The chosen knots and coefficients are
mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))

## Plots : fine grid for visualization: instead of   xx <- seq(x[1], 
x[length(x)], length.out = 1024)
rnx <- extendrange(x) ## to extrapolate a bit
xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))

cbind(f = f(xx),
   sspl = predict(m.sspl, xx)$y,
   mars = predict(fm1, xx)) -> fits

plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
cols <- c(adjustcolor(2, 1/3),
   adjustcolor(4, 2/3),
   adjustcolor("orange4", 2/3))
lwds <- c(3, 2, 2)
matlines(xx, fits, col = cols, lwd = lwds, lty=1)
legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
col=cols, lwd=lwds, bty = "n")
title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
mtext(substitute(f(x) == FDEF, list(FDEF = body(f

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
=
Vito M.R. Muggeo, PhD
Professor of Statistics
Dip.to Sc Econom, Az e Statistiche
Università di Palermo
viale delle Scienze, edificio 13
90128 Palermo - ITALY
tel: 091 23895240; fax: 091 485726
http://www.unipa.it/persone/docenti/m/vito.muggeo
Associate Editor: Statistical 

Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-26 Thread Michael Dewey
Just to remark that the R community hs been active in developing 
software in this area. The CRAN Task View has almost twenty packages 
claiming to address the problem under various names.


https://cran.r-project.org/view=MetaAnalysis

Whether the methods work has also been the topic of discussion on the 
mailing list dedicated to meta-analysis and interested readers may want 
to search its archives


https://stat.ethz.ch/pipermail/r-sig-meta-analysis/

Michael


On 25/07/2024 23:40, Robert Baer wrote:

Chapter 9 might be of interest:

https://bookdown.org/MathiasHarrer/Doing_Meta_Analysis_in_R/

And specifically, for funnel plots in R:
https://wviechtb.github.io/metafor/reference/funnel.html

Best,
Rob

On 7/25/2024 6:40 AM, Richard O'Keefe wrote:

I know you didn't want to stimulate discussion, but the problem is not
confined to publication.  "Adverse reaction to medication" monitoring
programs are plagued by a similarly massive under-reporting problem:
adverse reactions are seldom reported unless they are particularly bad
or surprising.  (The Ministry of Health in my country estimates 90% of
cases are never reported.)  Remembering to check for possible bias
from unreported cases is a human problem for analysts.  Which, if any,
R packages have proven useful to detecting the existence of a
systematic under-reporting problem might well be an appropriate topic
for this list.

On Thu, 25 Jul 2024 at 02:44, Bert Gunter  wrote:

Again, this is off topic, not about statistics or R, but I think of
interest to many on this list. The title is:

"So you got a null result. Will anyone publish it?"

https://www.nature.com/articles/d41586-024-02383-9

Best to all,
Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



--
Michael

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Bert Gunter
If I have understood the request, I'm not sure that omitting all 0
pairs for each pair of columns makes much sense, but be that as it
may, here's another way to do it by using the 'FUN' argument of combn
to encapsulate any calculations that you do. I just use cor() as the
calculation -- you can use anything you like that takes two vectors of
0's and 1's and produces fixed length numeric results (or fromm which
you can extract such).

I encapsulated it all in a little function. Note that I first
converted the data frame to a matrix. Because of their generality,
data frames carry a lot of extra baggage that can slow purely numeric
manipulations down.

Anyway, here's the function, 'somecors' (I'm a bad name picker :(  ! )

   somecors <- function(dat, func = cor){
  dat <- as.matrix(dat)
  indx <- seq_len(ncol(dat))
 combn(indx, 2, FUN = \(z) {
i <- z[1]; j <- z[2]
k <- dat[, i ] | dat[, j ]
c(z,func(dat[k,i ], dat[k,j ]))
 })
   }

Results come out as a matrix with combn(ncol(dat),2) columns, the
first 2 rows giving the pair of column numbers for each column,and
then 1 or more rows (possibly extracted) from whatever func you use.
Here's the results for your data formatted to 2 decimal places:

> round(somecors(dat),2)
 [,1]  [,2]  [,3]  [,4] [,5]  [,6]
[1,]  1.0  1.00  1.00  2.002  3.00
[2,]  2.0  3.00  4.00  3.004  4.00
[3,] -0.5 -0.41 -0.35 -0.41   NA -0.47
Warning message:
In func(dat[k, i], dat[k, j]) : the standard deviation is zero

The NA and warning comes in the 2,4 pair of columns because after
removing all zero rows in the pair, dat[,4] is all 1's, giving a zero
in the denominator of the cor() calculation -- again, assuming I have
correctly understood your request. If so, this might be something you
need to worry about.

Again, feel free to ignore if  I have misinterpreterd or this does not suit.

Cheers,
Bert


On Thu, Jul 25, 2024 at 2:01 PM Rui Barradas  wrote:
>
> Às 20:47 de 25/07/2024, Yuan Chun Ding escreveu:
> > Hi Rui,
> >
> > You are always very helpful!! Thank you,
> >
> > I just modified your R codes to remove a row with zero values in both 
> > column pair as below for my real data.
> >
> > Ding
> >
> > dat<-gene22mut.coded
> > r <- P <- matrix(NA, nrow = 22L, ncol = 22L,
> >   dimnames = list(names(dat), names(dat)))
> >
> > for(i in 1:22) {
> >#i=1
> >x <- dat[[i]]
> >for(j in (1:22)) {
> >  #j=2
> >  if(i == j) {
> ># there's nothing to test, assign correlation 1
> >r[i, j] <- 1
> >  } else {
> >tmp <-cbind(x,dat[[j]])
> >row0 <-rowSums(tmp)
> >tem2 <-tmp[row0!=0,]
> >tmp3 <- cor.test(tem2[,1],tem2[,2])
> >r[i, j] <- tmp3$estimate
> >P[i, j] <- tmp3$p.value
> >  }
> >}
> > }
> > r<-as.data.frame(r)
> > P<-as.data.frame(P)
> >
> > From: R-help  On Behalf Of Yuan Chun Ding via 
> > R-help
> > Sent: Thursday, July 25, 2024 11:26 AM
> > To: Rui Barradas ; r-help@r-project.org
> > Subject: Re: [R] please help generate a square correlation matrix
> >
> > HI Rui, Thank you for the help! You did not remove a row if zero values 
> > exist in both column pair, right? Ding From: Rui Barradas  > sapo. pt> Sent: Thursday, July 25, 2024 11: 15 AM To: Yuan Chun Ding 
> > ;
> >
> >
> > HI Rui,
> >
> >
> >
> > Thank you for the  help!
> >
> >
> >
> > You did not remove a row if zero values exist in both column pair, right?
> >
> >
> >
> > Ding
> >
> >
> >
> > From: Rui Barradas mailto:ruipbarra...@sapo.pt>>
> >
> > Sent: Thursday, July 25, 2024 11:15 AM
> >
> > To: Yuan Chun Ding mailto:ycd...@coh.org>>; 
> > r-help@r-project.org
> >
> > Subject: Re: [R] please help generate a square correlation matrix
> >
> >
> >
> > Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R users, 
> > > > I generated a square correlation matrix for the dat dataframe below; > 
> > dat<-data. frame(g1=c(1,0,0,1,1,1,0,0,0), > g2=c(0,1,0,1,0,1,1,0,0), > 
> > g3=c(1,1,0,0,0,1,0,0,0),
> >
> >
> >
> >
> >
> > Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:
> >
> >
> >
> >> Hi R users,
> >
> >
> >
> >>
> >
> >
> >
> >> I generated a square correlation matrix for the dat dataframe below;
> >
> >
> >
> >> dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),
> >
> >
> >
> >>   g2=c(0,1,0,1,0,1,1,0,0),
> >
> >
> >
> >>   g3=c(1,1,0,0,0,1,0,0,0),
> >
> >
> >
> >>   g4=c(0,1,0,1,1,1,1,1,0))
> >
> >
> >
> >> library("Hmisc")
> >
> >
> >
> >> dat.rcorr = rcorr(as.matrix(dat))
> >
> >
> >
> >> dat.r <-round(dat.rcorr$r,2)
> >
> >
> >
> >>
> >
> >
> >
> >> however, I want to modify this correlation calculation;
> >
> >
> >
> >> my dat has more than 1000 rows and 22 columns;
> >
> >
> >
> >> in each column, less than 10% values are 1, most of them are 0;
> >
> >
> >
> >> so I want to remove a  row with value of zero in both columns when 
> >> calculate correlation 

Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-25 Thread Robert Baer

Chapter 9 might be of interest:

https://bookdown.org/MathiasHarrer/Doing_Meta_Analysis_in_R/

And specifically, for funnel plots in R:
https://wviechtb.github.io/metafor/reference/funnel.html

Best,
Rob

On 7/25/2024 6:40 AM, Richard O'Keefe wrote:

I know you didn't want to stimulate discussion, but the problem is not
confined to publication.  "Adverse reaction to medication" monitoring
programs are plagued by a similarly massive under-reporting problem:
adverse reactions are seldom reported unless they are particularly bad
or surprising.  (The Ministry of Health in my country estimates 90% of
cases are never reported.)  Remembering to check for possible bias
from unreported cases is a human problem for analysts.  Which, if any,
R packages have proven useful to detecting the existence of a
systematic under-reporting problem might well be an appropriate topic
for this list.

On Thu, 25 Jul 2024 at 02:44, Bert Gunter  wrote:

Again, this is off topic, not about statistics or R, but I think of
interest to many on this list. The title is:

"So you got a null result. Will anyone publish it?"

https://www.nature.com/articles/d41586-024-02383-9

Best to all,
Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Rui Barradas

Às 20:47 de 25/07/2024, Yuan Chun Ding escreveu:

Hi Rui,

You are always very helpful!! Thank you,

I just modified your R codes to remove a row with zero values in both column 
pair as below for my real data.

Ding

dat<-gene22mut.coded
r <- P <- matrix(NA, nrow = 22L, ncol = 22L,
  dimnames = list(names(dat), names(dat)))

for(i in 1:22) {
   #i=1
   x <- dat[[i]]
   for(j in (1:22)) {
 #j=2
 if(i == j) {
   # there's nothing to test, assign correlation 1
   r[i, j] <- 1
 } else {
   tmp <-cbind(x,dat[[j]])
   row0 <-rowSums(tmp)
   tem2 <-tmp[row0!=0,]
   tmp3 <- cor.test(tem2[,1],tem2[,2])
   r[i, j] <- tmp3$estimate
   P[i, j] <- tmp3$p.value
 }
   }
}
r<-as.data.frame(r)
P<-as.data.frame(P)

From: R-help  On Behalf Of Yuan Chun Ding via 
R-help
Sent: Thursday, July 25, 2024 11:26 AM
To: Rui Barradas ; r-help@r-project.org
Subject: Re: [R] please help generate a square correlation matrix

HI Rui, Thank you for the help! You did not remove a row if zero values exist in both 
column pair, right? Ding From: Rui Barradas  Sent: Thursday, 
July 25, 2024 11: 15 AM To: Yuan Chun Ding ;


HI Rui,



Thank you for the  help!



You did not remove a row if zero values exist in both column pair, right?



Ding



From: Rui Barradas mailto:ruipbarra...@sapo.pt>>

Sent: Thursday, July 25, 2024 11:15 AM

To: Yuan Chun Ding mailto:ycd...@coh.org>>; 
r-help@r-project.org

Subject: Re: [R] please help generate a square correlation matrix



Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R users, > > I generated 
a square correlation matrix for the dat dataframe below; > dat<-data. 
frame(g1=c(1,0,0,1,1,1,0,0,0), > g2=c(0,1,0,1,0,1,1,0,0), > g3=c(1,1,0,0,0,1,0,0,0),





Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:




Hi R users,











I generated a square correlation matrix for the dat dataframe below;





dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),





  g2=c(0,1,0,1,0,1,1,0,0),





  g3=c(1,1,0,0,0,1,0,0,0),





  g4=c(0,1,0,1,1,1,1,1,0))





library("Hmisc")





dat.rcorr = rcorr(as.matrix(dat))





dat.r <-round(dat.rcorr$r,2)











however, I want to modify this correlation calculation;





my dat has more than 1000 rows and 22 columns;





in each column, less than 10% values are 1, most of them are 0;





so I want to remove a  row with value of zero in both columns when calculate 
correlation between two columns.





I just want to check whether those values of 1 are correlated between two 
columns.





Please look at my code in the following;











cor.4gene <-matrix(0,nrow=4*4, ncol=4)





for (i in 1:4){





#i=1





for (j in 1:4) {





  #j=1





  d <-dat[,c(i,j)]%>%





filter(eval(as.symbol(colnames(dat)[i]))!=0 |





 eval(as.symbol(colnames(dat)[j]))!=0)





  c <-cor.test(d[,1],d[,2])





  cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],





  c$estimate,c$p.value)





}





}





cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)





colnames(cor.4gene)<-c("gene1","gene2","cor","P")











Can you tell me what mistakes I made?





first, why cor is NA when calculation of correlation for g1 and g1, I though it 
should be 1.











cor.4gene$cor[is.na(cor.4gene$cor)]<-1





cor.4gene$cor[is.na(cor.4gene$P)]<-0





cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor)











Then this line of code above did not generate a square matrix as what the HMisc 
library did.





How to fix my code?











Thank you,











Ding

















--











-SECURITY/CONFIDENTIALITY WARNING-











This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to rec





   eive further communications via e-mail, 

Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Yuan Chun Ding via R-help
Hi Rui,

You are always very helpful!! Thank you,

I just modified your R codes to remove a row with zero values in both column 
pair as below for my real data.

Ding

dat<-gene22mut.coded
r <- P <- matrix(NA, nrow = 22L, ncol = 22L,
 dimnames = list(names(dat), names(dat)))

for(i in 1:22) {
  #i=1
  x <- dat[[i]]
  for(j in (1:22)) {
#j=2
if(i == j) {
  # there's nothing to test, assign correlation 1
  r[i, j] <- 1
} else {
  tmp <-cbind(x,dat[[j]])
  row0 <-rowSums(tmp)
  tem2 <-tmp[row0!=0,]
  tmp3 <- cor.test(tem2[,1],tem2[,2])
  r[i, j] <- tmp3$estimate
  P[i, j] <- tmp3$p.value
}
  }
}
r<-as.data.frame(r)
P<-as.data.frame(P)

From: R-help  On Behalf Of Yuan Chun Ding via 
R-help
Sent: Thursday, July 25, 2024 11:26 AM
To: Rui Barradas ; r-help@r-project.org
Subject: Re: [R] please help generate a square correlation matrix

HI Rui, Thank you for the help! You did not remove a row if zero values exist 
in both column pair, right? Ding From: Rui Barradas  
Sent: Thursday, July 25, 2024 11: 15 AM To: Yuan Chun Ding ;


HI Rui,



Thank you for the  help!



You did not remove a row if zero values exist in both column pair, right?



Ding



From: Rui Barradas mailto:ruipbarra...@sapo.pt>>

Sent: Thursday, July 25, 2024 11:15 AM

To: Yuan Chun Ding mailto:ycd...@coh.org>>; 
r-help@r-project.org

Subject: Re: [R] please help generate a square correlation matrix



Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R users, > > 
I generated a square correlation matrix for the dat dataframe below; > 
dat<-data. frame(g1=c(1,0,0,1,1,1,0,0,0), > g2=c(0,1,0,1,0,1,1,0,0), > 
g3=c(1,1,0,0,0,1,0,0,0),





Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:



> Hi R users,



>



> I generated a square correlation matrix for the dat dataframe below;



> dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),



>  g2=c(0,1,0,1,0,1,1,0,0),



>  g3=c(1,1,0,0,0,1,0,0,0),



>  g4=c(0,1,0,1,1,1,1,1,0))



> library("Hmisc")



> dat.rcorr = rcorr(as.matrix(dat))



> dat.r <-round(dat.rcorr$r,2)



>



> however, I want to modify this correlation calculation;



> my dat has more than 1000 rows and 22 columns;



> in each column, less than 10% values are 1, most of them are 0;



> so I want to remove a  row with value of zero in both columns when calculate 
> correlation between two columns.



> I just want to check whether those values of 1 are correlated between two 
> columns.



> Please look at my code in the following;



>



> cor.4gene <-matrix(0,nrow=4*4, ncol=4)



> for (i in 1:4){



>#i=1



>for (j in 1:4) {



>  #j=1



>  d <-dat[,c(i,j)]%>%



>filter(eval(as.symbol(colnames(dat)[i]))!=0 |



> eval(as.symbol(colnames(dat)[j]))!=0)



>  c <-cor.test(d[,1],d[,2])



>  cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],



>  c$estimate,c$p.value)



>}



> }



> cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)



> colnames(cor.4gene)<-c("gene1","gene2","cor","P")



>



> Can you tell me what mistakes I made?



> first, why cor is NA when calculation of correlation for g1 and g1, I though 
> it should be 1.



>



> cor.4gene$cor[is.na(cor.4gene$cor)]<-1



> cor.4gene$cor[is.na(cor.4gene$P)]<-0



> cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor)



>



> Then this line of code above did not generate a square matrix as what the 
> HMisc library did.



> How to fix my code?



>



> Thank you,



>



> Ding



>



>



> --



> 



> -SECURITY/CONFIDENTIALITY WARNING-



>



> This message and any attachments are intended solely for the individual or 
> entity to which they are addressed. This communication may contain 
> information that is privileged, confidential, or exempt from disclosure under 
> applicable law (e.g., personal health information, research data, financial 
> information). Because this e-mail has been sent without encryption, 
> individuals other than the intended recipient may be able to view the 
> information, forward it to others or tamper with the information without the 
> knowledge or consent of the sender. If you are not the intended recipient, or 
> the employee or person responsible for delivering the message to the intended 
> recipient, any dissemination, distribution or copying of the communication is 
> strictly prohibited. If you received the communication in error, please 
> notify the sender immediately by replying to this message and deleting the 
> message and any accompanying files from your system. If, due to the security 
> risks, you do not wish to rec



>   eive further communications via e-mail, please reply to this message and 
> inform the sender 

Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Yuan Chun Ding via R-help
HI Rui,

Thank you for the  help!

You did not remove a row if zero values exist in both column pair, right?

Ding

From: Rui Barradas 
Sent: Thursday, July 25, 2024 11:15 AM
To: Yuan Chun Ding ; r-help@r-project.org
Subject: Re: [R] please help generate a square correlation matrix

Às 17: 39 de 25/07/2024, Yuan Chun Ding via R-help escreveu: > Hi R users, > > 
I generated a square correlation matrix for the dat dataframe below; > 
dat<-data. frame(g1=c(1,0,0,1,1,1,0,0,0), > g2=c(0,1,0,1,0,1,1,0,0), > 
g3=c(1,1,0,0,0,1,0,0,0),


Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:

> Hi R users,

>

> I generated a square correlation matrix for the dat dataframe below;

> dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),

>  g2=c(0,1,0,1,0,1,1,0,0),

>  g3=c(1,1,0,0,0,1,0,0,0),

>  g4=c(0,1,0,1,1,1,1,1,0))

> library("Hmisc")

> dat.rcorr = rcorr(as.matrix(dat))

> dat.r <-round(dat.rcorr$r,2)

>

> however, I want to modify this correlation calculation;

> my dat has more than 1000 rows and 22 columns;

> in each column, less than 10% values are 1, most of them are 0;

> so I want to remove a  row with value of zero in both columns when calculate 
> correlation between two columns.

> I just want to check whether those values of 1 are correlated between two 
> columns.

> Please look at my code in the following;

>

> cor.4gene <-matrix(0,nrow=4*4, ncol=4)

> for (i in 1:4){

>#i=1

>for (j in 1:4) {

>  #j=1

>  d <-dat[,c(i,j)]%>%

>filter(eval(as.symbol(colnames(dat)[i]))!=0 |

> eval(as.symbol(colnames(dat)[j]))!=0)

>  c <-cor.test(d[,1],d[,2])

>  cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],

>  c$estimate,c$p.value)

>}

> }

> cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)

> colnames(cor.4gene)<-c("gene1","gene2","cor","P")

>

> Can you tell me what mistakes I made?

> first, why cor is NA when calculation of correlation for g1 and g1, I though 
> it should be 1.

>

> cor.4gene$cor[is.na(cor.4gene$cor)]<-1

> cor.4gene$cor[is.na(cor.4gene$P)]<-0

> cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor)

>

> Then this line of code above did not generate a square matrix as what the 
> HMisc library did.

> How to fix my code?

>

> Thank you,

>

> Ding

>

>

> --

> 

> -SECURITY/CONFIDENTIALITY WARNING-

>

> This message and any attachments are intended solely for the individual or 
> entity to which they are addressed. This communication may contain 
> information that is privileged, confidential, or exempt from disclosure under 
> applicable law (e.g., personal health information, research data, financial 
> information). Because this e-mail has been sent without encryption, 
> individuals other than the intended recipient may be able to view the 
> information, forward it to others or tamper with the information without the 
> knowledge or consent of the sender. If you are not the intended recipient, or 
> the employee or person responsible for delivering the message to the intended 
> recipient, any dissemination, distribution or copying of the communication is 
> strictly prohibited. If you received the communication in error, please 
> notify the sender immediately by replying to this message and deleting the 
> message and any accompanying files from your system. If, due to the security 
> risks, you do not wish to rec

>   eive further communications via e-mail, please reply to this message and 
> inform the sender that you do not wish to receive further e-mail from the 
> sender. (LCP301)

> 

>

> [[alternative HTML version deleted]]

>

> __

> R-help@r-project.org mailing list -- To 
> UNSUBSCRIBE and more, see

> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb8338TBM$

> PLEASE do read the posting guide 
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!Fou38LsQmgU!tyykZkQmOKcwoWXEpV2ohbnr02thhHMabAcYLL_-7dteKHAabK-eo4rGDnwgSFjniAy8SO00L6Hb880tLw0$

> and provide commented, minimal, self-contained, reproducible code.

Hello,



You are complicating the code, there's no need for as.symbol/eval, the

column numbers do exactly the same.



# create the two results matrices 

Re: [R] please help generate a square correlation matrix

2024-07-25 Thread Rui Barradas

Às 17:39 de 25/07/2024, Yuan Chun Ding via R-help escreveu:

Hi R users,

I generated a square correlation matrix for the dat dataframe below;
dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),
 g2=c(0,1,0,1,0,1,1,0,0),
 g3=c(1,1,0,0,0,1,0,0,0),
 g4=c(0,1,0,1,1,1,1,1,0))
library("Hmisc")
dat.rcorr = rcorr(as.matrix(dat))
dat.r <-round(dat.rcorr$r,2)

however, I want to modify this correlation calculation;
my dat has more than 1000 rows and 22 columns;
in each column, less than 10% values are 1, most of them are 0;
so I want to remove a  row with value of zero in both columns when calculate 
correlation between two columns.
I just want to check whether those values of 1 are correlated between two 
columns.
Please look at my code in the following;

cor.4gene <-matrix(0,nrow=4*4, ncol=4)
for (i in 1:4){
   #i=1
   for (j in 1:4) {
 #j=1
 d <-dat[,c(i,j)]%>%
   filter(eval(as.symbol(colnames(dat)[i]))!=0 |
eval(as.symbol(colnames(dat)[j]))!=0)
 c <-cor.test(d[,1],d[,2])
 cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],
 c$estimate,c$p.value)
   }
}
cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)
colnames(cor.4gene)<-c("gene1","gene2","cor","P")

Can you tell me what mistakes I made?
first, why cor is NA when calculation of correlation for g1 and g1, I though it 
should be 1.

cor.4gene$cor[is.na(cor.4gene$cor)]<-1
cor.4gene$cor[is.na(cor.4gene$P)]<-0
cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor)

Then this line of code above did not generate a square matrix as what the HMisc 
library did.
How to fix my code?

Thank you,

Ding


--

-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to rec
  eive further communications via e-mail, please reply to this message and 
inform the sender that you do not wish to receive further e-mail from the 
sender. (LCP301)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

You are complicating the code, there's no need for as.symbol/eval, the 
column numbers do exactly the same.


# create the two results matrices beforehand
r <- P <- matrix(NA, nrow = 4L, ncol = 4L, dimnames = list(names(dat), 
names(dat)))


for(i in 1:4) {
  x <- dat[[i]]
  for(j in (1:4)) {
if(i == j) {
  # there's nothing to test, assign correlation 1
  r[i, j] <- 1
} else {
  tmp <- cor.test(x, dat[[j]])
  r[i, j] <- tmp$estimate
  P[i, j] <- tmp$p.value
}
  }
}

# these two results are equal up to floating-point precision
dat.rcorr$r
#>   g1g2g3g4
#> g1 1.000 0.100 0.3162278 0.1581139
#> g2 0.100 1.000 0.3162278 0.6324555
#> g3 0.3162278 0.3162278 1.000 0.000
#> g4 0.1581139 0.6324555 0.000 1.000
r
#>   g1g2   g3   g4
#> g1 1.000 0.100 3.162278e-01 1.581139e-01
#> g2 0.100 1.000 3.162278e-01 6.324555e-01
#> g3 0.3162278 0.3162278 1.00e+00 1.355253e-20
#> g4 0.1581139 0.6324555 1.355253e-20 1.00e+00

# these two results are equal up to floating-point precision
dat.rcorr$P
#>   g1 g2g3 g4
#> g1NA 0.79797170 0.4070838 0.68452834
#> g2 0.7979717 NA 0.4070838 0.06758329
#> g3 0.4070838 0.40708382NA 1.
#> g4 0.6845283 0.06758329 1.000 NA
P
#>   g1 g2g3 g4
#> g1NA 0.79797170 0.4070838 0.68452834
#> g2 0.7979717 NA 0.4070838 0.06758329
#> g3 

[R] please help generate a square correlation matrix

2024-07-25 Thread Yuan Chun Ding via R-help
Hi R users,

I generated a square correlation matrix for the dat dataframe below;
dat<-data.frame(g1=c(1,0,0,1,1,1,0,0,0),
g2=c(0,1,0,1,0,1,1,0,0),
g3=c(1,1,0,0,0,1,0,0,0),
g4=c(0,1,0,1,1,1,1,1,0))
library("Hmisc")
dat.rcorr = rcorr(as.matrix(dat))
dat.r <-round(dat.rcorr$r,2)

however, I want to modify this correlation calculation;
my dat has more than 1000 rows and 22 columns;
in each column, less than 10% values are 1, most of them are 0;
so I want to remove a  row with value of zero in both columns when calculate 
correlation between two columns.
I just want to check whether those values of 1 are correlated between two 
columns.
Please look at my code in the following;

cor.4gene <-matrix(0,nrow=4*4, ncol=4)
for (i in 1:4){
  #i=1
  for (j in 1:4) {
#j=1
d <-dat[,c(i,j)]%>%
  filter(eval(as.symbol(colnames(dat)[i]))!=0 |
   eval(as.symbol(colnames(dat)[j]))!=0)
c <-cor.test(d[,1],d[,2])
cor.4gene[i*j,]<-c(colnames(dat)[i],colnames(dat)[j],
c$estimate,c$p.value)
  }
}
cor.4gene<-as.data.frame(cor.4gene)%>%filter(V1 !=0)
colnames(cor.4gene)<-c("gene1","gene2","cor","P")

Can you tell me what mistakes I made?
first, why cor is NA when calculation of correlation for g1 and g1, I though it 
should be 1.

cor.4gene$cor[is.na(cor.4gene$cor)]<-1
cor.4gene$cor[is.na(cor.4gene$P)]<-0
cor.4gene.sq <-pivot_wider(cor.4gene, names_from = gene1, values_from = cor)

Then this line of code above did not generate a square matrix as what the HMisc 
library did.
How to fix my code?

Thank you,

Ding


--

-SECURITY/CONFIDENTIALITY WARNING-  

This message and any attachments are intended solely for the individual or 
entity to which they are addressed. This communication may contain information 
that is privileged, confidential, or exempt from disclosure under applicable 
law (e.g., personal health information, research data, financial information). 
Because this e-mail has been sent without encryption, individuals other than 
the intended recipient may be able to view the information, forward it to 
others or tamper with the information without the knowledge or consent of the 
sender. If you are not the intended recipient, or the employee or person 
responsible for delivering the message to the intended recipient, any 
dissemination, distribution or copying of the communication is strictly 
prohibited. If you received the communication in error, please notify the 
sender immediately by replying to this message and deleting the message and any 
accompanying files from your system. If, due to the security risks, you do not 
wish to rec
 eive further communications via e-mail, please reply to this message and 
inform the sender that you do not wish to receive further e-mail from the 
sender. (LCP301)


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-25 Thread Richard O'Keefe
I know you didn't want to stimulate discussion, but the problem is not
confined to publication.  "Adverse reaction to medication" monitoring
programs are plagued by a similarly massive under-reporting problem:
adverse reactions are seldom reported unless they are particularly bad
or surprising.  (The Ministry of Health in my country estimates 90% of
cases are never reported.)  Remembering to check for possible bias
from unreported cases is a human problem for analysts.  Which, if any,
R packages have proven useful to detecting the existence of a
systematic under-reporting problem might well be an appropriate topic
for this list.

On Thu, 25 Jul 2024 at 02:44, Bert Gunter  wrote:
>
> Again, this is off topic, not about statistics or R, but I think of
> interest to many on this list. The title is:
>
> "So you got a null result. Will anyone publish it?"
>
> https://www.nature.com/articles/d41586-024-02383-9
>
> Best to all,
> Bert
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-24 Thread Ebert,Timothy Aaron
Here is one response: https://doi.org/10.1093/jisesa/iew092
Or paraphrased: yes.

Regards,
Tim

-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Wednesday, July 24, 2024 10:44 AM
To: R-help 
Subject: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

[External Email]

Again, this is off topic, not about statistics or R, but I think of interest to 
many on this list. The title is:

"So you got a null result. Will anyone publish it?"

https://www.nature.com/articles/d41586-024-02383-9

Best to all,
Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-24 Thread avi.e.gross
Bert,

Although the Яticle was interesting, I have to wonder how much publishing there 
has been in formal journals related to R, especially recently, that is of a 
research variety.

I am thinking of an example and wonder if we picked something that is often 
re-implemented by many parties such as ways of doing graphics or making and 
manipulating things like variants of data.frames. If someone came up with some 
new design, such as tibbles or data.table and hypothesized they would be better 
in some ways, then that could be the basis for doing some serious testing and 
perhaps publishing results. Sometimes it could just be a comparison of all 
kinds of cases and a discussion of when one or the other might be better, but 
other times, the hypothesis might be determined in advance to be looking for a 
specific outcome and if wrong, publishing it fairly would let people know that 
your guess was wrong!

I once had a chance to get an Erdős number of two as my adviser later published 
several times with Paul Erdős, but I ended up proving the opposite of my 
hypothesis, which was not really worthy of publishing! LOL!

I am curious where people go to see research papers in various aspects of 
Computer Science, or in ones dedicated to specific languages or systems. Is 
there the same publish or perish aspect in academia as for some older sciences 
or is the field different in many ways and perhaps often seen as a helper in 
other disciplines so you can publish elsewhere?

Some areas of the field like aspects of AI, might still be considered quite 
active but perhaps other areas are seen as less worthy of much further analysis 
and refinement. Unless actively being changed, where do programming languages 
like R fit in?


-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Wednesday, July 24, 2024 10:44 AM
To: R-help 
Subject: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

Again, this is off topic, not about statistics or R, but I think of
interest to many on this list. The title is:

"So you got a null result. Will anyone publish it?"

https://www.nature.com/articles/d41586-024-02383-9

Best to all,
Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-24 Thread Bert Gunter
Thank you, but I should have said that this post was not meant to
provoke on-list discussion, as it *is* off topic. My Apologies. Please
keep any further discussion private to me only.

-- Bert

On Wed, Jul 24, 2024 at 8:22 AM Ogbos Okike  wrote:
>
> Dear Bert,
>
> You have made my day!! Your post is a great help and very useful in my field.
>
> The paper is not among the off-the-shelf research output. Some of us, who get 
> into unenviable conflict and disputation with some reverenced authorities in 
> our field, understand the weight of the article. I had not even consumed half 
> of it before I decided to thank you. I will quickly go back to see how it 
> goes. I am sure it is going to be one of my best collections for the year!!
>
> Thank you very much. I often follow your posts, even if the topic does not 
> concern me. Your tough comments always make sense to me. Today, I have gained 
> something vital I should have lost if I overlooked it because of the subject 
> heading: "
>
> OFF TOPIC"
>
>
> Please accept my warmest regards
> Ogbos
>
> On Wed, Jul 24, 2024 at 3:44 PM Bert Gunter  wrote:
>>
>> Again, this is off topic, not about statistics or R, but I think of
>> interest to many on this list. The title is:
>>
>> "So you got a null result. Will anyone publish it?"
>>
>> https://www.nature.com/articles/d41586-024-02383-9
>>
>> Best to all,
>> Bert
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-24 Thread Ogbos Okike
Dear Bert,

You have made my day!! Your post is a great help and very useful in my
field.

The paper is not among the off-the-shelf research output. Some of us, who
get into unenviable conflict and disputation with some reverenced
authorities in our field, understand the weight of the article. I had not
even consumed half of it before I decided to thank you. I will quickly go
back to see how it goes. I am sure it is going to be one of my best
collections for the year!!

Thank you very much. I often follow your posts, even if the topic does not
concern me. Your tough comments always make sense to me. Today, I have
gained something vital I should have lost if I overlooked it because of the
subject heading: "
OFF TOPIC"

Please accept my warmest regards
Ogbos

On Wed, Jul 24, 2024 at 3:44 PM Bert Gunter  wrote:

> Again, this is off topic, not about statistics or R, but I think of
> interest to many on this list. The title is:
>
> "So you got a null result. Will anyone publish it?"
>
> https://www.nature.com/articles/d41586-024-02383-9
>
> Best to all,
> Bert
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] OFF TOPIC: Nature article on File Drawer Problem in Reserach

2024-07-24 Thread Bert Gunter
Again, this is off topic, not about statistics or R, but I think of
interest to many on this list. The title is:

"So you got a null result. Will anyone publish it?"

https://www.nature.com/articles/d41586-024-02383-9

Best to all,
Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-22 Thread CALUM POLWART
But have we lured you to the dark side with the tidyverse yet ;-)



On Mon, 22 Jul 2024, 15:22 Bert Gunter,  wrote:

> Thanks.
>
> I found this to be quite informative and a nice example of how useful
> R-Help can be as a resource for R users.
>
> Best,
> Bert
>
> On Mon, Jul 22, 2024 at 4:50 AM Gabor Grothendieck
>  wrote:
> >
> > Base R. Regarding code improvements:
> >
> > 1. Personally I find (\(...) ...)() notation hard to read (although by
> > placing (\(x), the body and )() on 3 separate lines it can be improved
> > somewhat). Instead let us use a named function. The name of the
> > function can also serve to self document the code.
> >
> > 2. The use of dat both at the start of the pipeline and then again
> > within a later step of the pipeline goes against a strict left to
> > right flow. In general if this occurs it is either a sign that we need
> > to break the pipeline into two or that we need to find another
> > approach which is what we do here.
> >
> > We can use the base R code below. Note that the column names produced
> > by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the
> > column names remove .V from all column names as in the fix_colnames
> > function shown. It does no harm to apply that to all column names
> > since the remaining column names will not match.
> >
> >   fix_colnames <- function(x) {
> > setNames(x, sub("\\.V", "", names(x)))
> >   }
> >
> >   dat |>
> >  transform(S = read.table(text = string,
> >header = FALSE, fill = TRUE, na.strings = "")) |>
> >fix_colnames()
> >
> > Another way to write this which does not use a separate defined
> > function nor the anonymous function notation is to box the output of
> > transform:
> >
> >   dat |>
> >  transform(S = read.table(text = string,
> >header = FALSE, fill = TRUE, na.strings = "")) |>
> >list(x = _) |>
> >with( setNames(x, sub("\\.V", "", names(x))) )
> >
> > dplyr. Alternately use dplyr in which case we can make use of
> > rename_with . In this case read.table(...) creates column names V1,
> > V2, etc. and mutate does not change them so simply replacing V with S
> > at the start of each column name in the output of read.table will do.
> > Also we can pipe the read.table output directly to rename_with using a
> > nested pipeline, i.e. the second pipe is entirely within mutate rather
> > than after it) since mutate won't change the column names. The win
> > here is because, unlike transform, mutate does not require the S= that
> > is needed with transform (although it allows it had we wanted it).
> >
> >   library(dplyr)
> >
> >   dat |>
> >  mutate(read.table(text = string,
> >header = FALSE, fill = TRUE, na.strings = "")  |>
> >   rename_with(~ sub("^V", "S", .x))
> > )
> >
> >
> > On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter 
> wrote:
> > >
> > > As always, good point.
> > > Here's a piped version of your code for those who are pipe
> > > afficianados. As I'm not very skilled with pipes, it might certainly
> > > be improved.
> > > dat <-
> > >   dat$string |>
> > >  read.table( text = _, fill = TRUE, header = FALSE, na.strings
> = "")  |>
> > >  (\(x)'names<-'(x,paste0("s", seq_along(x() |>
> > >  (\(x)cbind(dat, x))()
> > >
> > > -- Bert
> > >
> > >
> > > On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck
> > >  wrote:
> > > >
> > > > Fixing col.names=paste0("S", 1:5) assumes that there will be 5
> columns and
> > > > we may not want to do that.  If there are only 3 fields in string,
> at the most,
> > > > we may wish to generate only 3 columns.
> > > >
> > > > On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter 
> wrote:
> > > > >
> > > > > Nice! -- Let read.table do the work of handling the NA's.
> > > > > However, even simpler is to use the 'colnames' argument of
> > > > > read.table() for the column names no?
> > > > >
> > > > >   string <- read.table(text = dat$string, fill = TRUE, header =
> > > > > FALSE, na.strings = "",
> > > > > col.names = paste0("s", 1:5))
> > > > >   dat <- cbind(dat, string)
> > > > >
> > > > > -- Bert
> > > > >
> > > > > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
> > > > >  wrote:
> > > > > >
> > > > > > We can use read.table for a base R solution
> > > > > >
> > > > > > string <- read.table(text = dat$string, fill = TRUE, header =
> FALSE,
> > > > > > na.strings = "")
> > > > > > names(string) <- paste0("S", seq_along(string))
> > > > > > cbind(dat[-3], string)
> > > > > >
> > > > > > On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I want to extract new variables from a string and add it to
> the dataframe.
> > > > > > > Sample data is csv file.
> > > > > > >
> > > > > > > dat<-read.csv(text="Year, Sex,string
> > > > > > > 2002,F,15 xc Ab
> > > > > > > 2003,F,14
> > > > > > > 2004,M,18 xb 25 35 21
> > > > > > > 2005,M,13 25
> > > > > > > 2006,M,14 ac 256 AV 35
> > > > > > > 

Re: [R] Extract

2024-07-22 Thread Gabor Grothendieck
I had missed that one can pass fix.empty.names = TRUE to transform and
if we do that then we can
put an unnamed data.frame in transform like we can with mutate so
making that change we have the following
base R solution where there is an inner nested pipeline within the
outer pipeline as with the dplyr example.

  transform(dat,
read.table(text = string, header = FALSE, na.strings = "", fill =
TRUE), fix.empty.names = TRUE) |>
  list(x = _) |>
  with( setNames(x, sub("V", "S", names(x)) )
)


On Mon, Jul 22, 2024 at 7:49 AM Gabor Grothendieck
 wrote:
>
> Base R. Regarding code improvements:
>
> 1. Personally I find (\(...) ...)() notation hard to read (although by
> placing (\(x), the body and )() on 3 separate lines it can be improved
> somewhat). Instead let us use a named function. The name of the
> function can also serve to self document the code.
>
> 2. The use of dat both at the start of the pipeline and then again
> within a later step of the pipeline goes against a strict left to
> right flow. In general if this occurs it is either a sign that we need
> to break the pipeline into two or that we need to find another
> approach which is what we do here.
>
> We can use the base R code below. Note that the column names produced
> by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the
> column names remove .V from all column names as in the fix_colnames
> function shown. It does no harm to apply that to all column names
> since the remaining column names will not match.
>
>   fix_colnames <- function(x) {
> setNames(x, sub("\\.V", "", names(x)))
>   }
>
>   dat |>
>  transform(S = read.table(text = string,
>header = FALSE, fill = TRUE, na.strings = "")) |>
>fix_colnames()
>
> Another way to write this which does not use a separate defined
> function nor the anonymous function notation is to box the output of
> transform:
>
>   dat |>
>  transform(S = read.table(text = string,
>header = FALSE, fill = TRUE, na.strings = "")) |>
>list(x = _) |>
>with( setNames(x, sub("\\.V", "", names(x))) )
>
> dplyr. Alternately use dplyr in which case we can make use of
> rename_with . In this case read.table(...) creates column names V1,
> V2, etc. and mutate does not change them so simply replacing V with S
> at the start of each column name in the output of read.table will do.
> Also we can pipe the read.table output directly to rename_with using a
> nested pipeline, i.e. the second pipe is entirely within mutate rather
> than after it) since mutate won't change the column names. The win
> here is because, unlike transform, mutate does not require the S= that
> is needed with transform (although it allows it had we wanted it).
>
>   library(dplyr)
>
>   dat |>
>  mutate(read.table(text = string,
>header = FALSE, fill = TRUE, na.strings = "")  |>
>   rename_with(~ sub("^V", "S", .x))
> )
>
>
> On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter  wrote:
> >
> > As always, good point.
> > Here's a piped version of your code for those who are pipe
> > afficianados. As I'm not very skilled with pipes, it might certainly
> > be improved.
> > dat <-
> >   dat$string |>
> >  read.table( text = _, fill = TRUE, header = FALSE, na.strings = 
> > "")  |>
> >  (\(x)'names<-'(x,paste0("s", seq_along(x() |>
> >  (\(x)cbind(dat, x))()
> >
> > -- Bert
> >
> >
> > On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck
> >  wrote:
> > >
> > > Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and
> > > we may not want to do that.  If there are only 3 fields in string, at the 
> > > most,
> > > we may wish to generate only 3 columns.
> > >
> > > On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter  
> > > wrote:
> > > >
> > > > Nice! -- Let read.table do the work of handling the NA's.
> > > > However, even simpler is to use the 'colnames' argument of
> > > > read.table() for the column names no?
> > > >
> > > >   string <- read.table(text = dat$string, fill = TRUE, header =
> > > > FALSE, na.strings = "",
> > > > col.names = paste0("s", 1:5))
> > > >   dat <- cbind(dat, string)
> > > >
> > > > -- Bert
> > > >
> > > > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
> > > >  wrote:
> > > > >
> > > > > We can use read.table for a base R solution
> > > > >
> > > > > string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
> > > > > na.strings = "")
> > > > > names(string) <- paste0("S", seq_along(string))
> > > > > cbind(dat[-3], string)
> > > > >
> > > > > On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I want to extract new variables from a string and add it to the 
> > > > > > dataframe.
> > > > > > Sample data is csv file.
> > > > > >
> > > > > > dat<-read.csv(text="Year, Sex,string
> > > > > > 2002,F,15 xc Ab
> > > > > > 2003,F,14
> > > > > > 2004,M,18 xb 25 35 21
> > > > > > 2005,M,13 25
> > > > > > 2006,M,14 ac 256 AV 35
> > 

Re: [R] Extract

2024-07-22 Thread Bert Gunter
Thanks.

I found this to be quite informative and a nice example of how useful
R-Help can be as a resource for R users.

Best,
Bert

On Mon, Jul 22, 2024 at 4:50 AM Gabor Grothendieck
 wrote:
>
> Base R. Regarding code improvements:
>
> 1. Personally I find (\(...) ...)() notation hard to read (although by
> placing (\(x), the body and )() on 3 separate lines it can be improved
> somewhat). Instead let us use a named function. The name of the
> function can also serve to self document the code.
>
> 2. The use of dat both at the start of the pipeline and then again
> within a later step of the pipeline goes against a strict left to
> right flow. In general if this occurs it is either a sign that we need
> to break the pipeline into two or that we need to find another
> approach which is what we do here.
>
> We can use the base R code below. Note that the column names produced
> by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the
> column names remove .V from all column names as in the fix_colnames
> function shown. It does no harm to apply that to all column names
> since the remaining column names will not match.
>
>   fix_colnames <- function(x) {
> setNames(x, sub("\\.V", "", names(x)))
>   }
>
>   dat |>
>  transform(S = read.table(text = string,
>header = FALSE, fill = TRUE, na.strings = "")) |>
>fix_colnames()
>
> Another way to write this which does not use a separate defined
> function nor the anonymous function notation is to box the output of
> transform:
>
>   dat |>
>  transform(S = read.table(text = string,
>header = FALSE, fill = TRUE, na.strings = "")) |>
>list(x = _) |>
>with( setNames(x, sub("\\.V", "", names(x))) )
>
> dplyr. Alternately use dplyr in which case we can make use of
> rename_with . In this case read.table(...) creates column names V1,
> V2, etc. and mutate does not change them so simply replacing V with S
> at the start of each column name in the output of read.table will do.
> Also we can pipe the read.table output directly to rename_with using a
> nested pipeline, i.e. the second pipe is entirely within mutate rather
> than after it) since mutate won't change the column names. The win
> here is because, unlike transform, mutate does not require the S= that
> is needed with transform (although it allows it had we wanted it).
>
>   library(dplyr)
>
>   dat |>
>  mutate(read.table(text = string,
>header = FALSE, fill = TRUE, na.strings = "")  |>
>   rename_with(~ sub("^V", "S", .x))
> )
>
>
> On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter  wrote:
> >
> > As always, good point.
> > Here's a piped version of your code for those who are pipe
> > afficianados. As I'm not very skilled with pipes, it might certainly
> > be improved.
> > dat <-
> >   dat$string |>
> >  read.table( text = _, fill = TRUE, header = FALSE, na.strings = 
> > "")  |>
> >  (\(x)'names<-'(x,paste0("s", seq_along(x() |>
> >  (\(x)cbind(dat, x))()
> >
> > -- Bert
> >
> >
> > On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck
> >  wrote:
> > >
> > > Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and
> > > we may not want to do that.  If there are only 3 fields in string, at the 
> > > most,
> > > we may wish to generate only 3 columns.
> > >
> > > On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter  
> > > wrote:
> > > >
> > > > Nice! -- Let read.table do the work of handling the NA's.
> > > > However, even simpler is to use the 'colnames' argument of
> > > > read.table() for the column names no?
> > > >
> > > >   string <- read.table(text = dat$string, fill = TRUE, header =
> > > > FALSE, na.strings = "",
> > > > col.names = paste0("s", 1:5))
> > > >   dat <- cbind(dat, string)
> > > >
> > > > -- Bert
> > > >
> > > > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
> > > >  wrote:
> > > > >
> > > > > We can use read.table for a base R solution
> > > > >
> > > > > string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
> > > > > na.strings = "")
> > > > > names(string) <- paste0("S", seq_along(string))
> > > > > cbind(dat[-3], string)
> > > > >
> > > > > On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I want to extract new variables from a string and add it to the 
> > > > > > dataframe.
> > > > > > Sample data is csv file.
> > > > > >
> > > > > > dat<-read.csv(text="Year, Sex,string
> > > > > > 2002,F,15 xc Ab
> > > > > > 2003,F,14
> > > > > > 2004,M,18 xb 25 35 21
> > > > > > 2005,M,13 25
> > > > > > 2006,M,14 ac 256 AV 35
> > > > > > 2007,F,11",header=TRUE)
> > > > > >
> > > > > > The string column has  a maximum of five variables. Some rows have 
> > > > > > all
> > > > > > and others may not have all the five variables. If missing then  
> > > > > > fill
> > > > > > it with NA,
> > > > > > Desired result is shown below,
> > > > > >
> > > > > >
> > > > > > Year,Sex,string, S1, S2, S3 S4,S5
> > > > 

Re: [R] Extract

2024-07-22 Thread avi.e.gross
Excellent message, Gabor.

Many tools we use are quite flexible and I just want to mention dplyr does have 
ways to use something like mutate to rename a column, albeit rename(0 is more 
specifically designed to do the job.

Here is an example of how mutate() can rename by making a new column and 
removing the old by using a sort of pipeline within mutate():

mydata <- data.frame(a=1, b=2)
mutate(mydata, 
   c=a, 
   a=NULL, 
   d=b, 
   b=NULL)

The result:

> mutate(mydata, c=a, a=NULL, d=b, b=NULL)
  c d
1 1 2

It is effectively the same as following up with a select as an alternative:

mydata |>
  mutate(c=a,
 d=b) |>
  select(c,d)

What people may not quite have grasped is that pipes are not a panacea and can 
be used alongside all kinds of other methods. Much of dplyr, such as shown 
above, but also in things like the filter() verb, does a sort of internal 
pipelining and can apply successive transformations before returning a result 
suitable for another part of a pipeline. Part of the philosophy was to make 
more functions where the first argument was something like a data.frame object 
(but it could be other things) that could be passed along in a pipeline. Trying 
to shoehorn in other functions that want the item in other positions makes for 
less intuitive code using place markers like period or underscore.

Pipelines are seen by many as a linear construct but as you point out, with 
careful design, you can make bigger pipelines that are more like graphs with 
some regions being a sub-pipeline and do fairly complex things, albeit hard for 
people to read and understand.

Maybe later, we can discuss again why some people insist on some kind of purity 
of using the base of languages that are not really expected to stay still but 
to evolve.


-Original Message-
From: R-help  On Behalf Of Gabor Grothendieck
Sent: Monday, July 22, 2024 7:49 AM
To: Bert Gunter 
Cc: r-help@R-project.org (r-help@r-project.org) 
Subject: Re: [R] Extract

Base R. Regarding code improvements:

1. Personally I find (\(...) ...)() notation hard to read (although by
placing (\(x), the body and )() on 3 separate lines it can be improved
somewhat). Instead let us use a named function. The name of the
function can also serve to self document the code.

2. The use of dat both at the start of the pipeline and then again
within a later step of the pipeline goes against a strict left to
right flow. In general if this occurs it is either a sign that we need
to break the pipeline into two or that we need to find another
approach which is what we do here.

We can use the base R code below. Note that the column names produced
by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the
column names remove .V from all column names as in the fix_colnames
function shown. It does no harm to apply that to all column names
since the remaining column names will not match.

  fix_colnames <- function(x) {
setNames(x, sub("\\.V", "", names(x)))
  }

  dat |>
 transform(S = read.table(text = string,
   header = FALSE, fill = TRUE, na.strings = "")) |>
   fix_colnames()

Another way to write this which does not use a separate defined
function nor the anonymous function notation is to box the output of
transform:

  dat |>
 transform(S = read.table(text = string,
   header = FALSE, fill = TRUE, na.strings = "")) |>
   list(x = _) |>
   with( setNames(x, sub("\\.V", "", names(x))) )

dplyr. Alternately use dplyr in which case we can make use of
rename_with . In this case read.table(...) creates column names V1,
V2, etc. and mutate does not change them so simply replacing V with S
at the start of each column name in the output of read.table will do.
Also we can pipe the read.table output directly to rename_with using a
nested pipeline, i.e. the second pipe is entirely within mutate rather
than after it) since mutate won't change the column names. The win
here is because, unlike transform, mutate does not require the S= that
is needed with transform (although it allows it had we wanted it).

  library(dplyr)

  dat |>
 mutate(read.table(text = string,
   header = FALSE, fill = TRUE, na.strings = "")  |>
  rename_with(~ sub("^V", "S", .x))
)


On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter  wrote:
>
> As always, good point.
> Here's a piped version of your code for those who are pipe
> afficianados. As I'm not very skilled with pipes, it might certainly
> be improved.
> dat <-
>   dat$string |>
>  read.table( text = _, fill = TRUE, header = FALSE, na.strings = "")  
> |>
>  (\(x)'names<-'(x,paste0("s", seq_along(x() |>
>  (\(x)cbind(dat, x))()
>
> -- Bert
>
>
> On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck
>  wrote:
> >
> > Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and
> > we may not want to do that.  If there are only 3 fields in string, at the 
> > most,
> > we may wish to generate only 3 columns.
> 

Re: [R] Extract

2024-07-22 Thread Gabor Grothendieck
Base R. Regarding code improvements:

1. Personally I find (\(...) ...)() notation hard to read (although by
placing (\(x), the body and )() on 3 separate lines it can be improved
somewhat). Instead let us use a named function. The name of the
function can also serve to self document the code.

2. The use of dat both at the start of the pipeline and then again
within a later step of the pipeline goes against a strict left to
right flow. In general if this occurs it is either a sign that we need
to break the pipeline into two or that we need to find another
approach which is what we do here.

We can use the base R code below. Note that the column names produced
by transform(S = read.table(...)) are S.V1, S.V2, etc. so to fix the
column names remove .V from all column names as in the fix_colnames
function shown. It does no harm to apply that to all column names
since the remaining column names will not match.

  fix_colnames <- function(x) {
setNames(x, sub("\\.V", "", names(x)))
  }

  dat |>
 transform(S = read.table(text = string,
   header = FALSE, fill = TRUE, na.strings = "")) |>
   fix_colnames()

Another way to write this which does not use a separate defined
function nor the anonymous function notation is to box the output of
transform:

  dat |>
 transform(S = read.table(text = string,
   header = FALSE, fill = TRUE, na.strings = "")) |>
   list(x = _) |>
   with( setNames(x, sub("\\.V", "", names(x))) )

dplyr. Alternately use dplyr in which case we can make use of
rename_with . In this case read.table(...) creates column names V1,
V2, etc. and mutate does not change them so simply replacing V with S
at the start of each column name in the output of read.table will do.
Also we can pipe the read.table output directly to rename_with using a
nested pipeline, i.e. the second pipe is entirely within mutate rather
than after it) since mutate won't change the column names. The win
here is because, unlike transform, mutate does not require the S= that
is needed with transform (although it allows it had we wanted it).

  library(dplyr)

  dat |>
 mutate(read.table(text = string,
   header = FALSE, fill = TRUE, na.strings = "")  |>
  rename_with(~ sub("^V", "S", .x))
)


On Sun, Jul 21, 2024 at 3:08 PM Bert Gunter  wrote:
>
> As always, good point.
> Here's a piped version of your code for those who are pipe
> afficianados. As I'm not very skilled with pipes, it might certainly
> be improved.
> dat <-
>   dat$string |>
>  read.table( text = _, fill = TRUE, header = FALSE, na.strings = "")  
> |>
>  (\(x)'names<-'(x,paste0("s", seq_along(x() |>
>  (\(x)cbind(dat, x))()
>
> -- Bert
>
>
> On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck
>  wrote:
> >
> > Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and
> > we may not want to do that.  If there are only 3 fields in string, at the 
> > most,
> > we may wish to generate only 3 columns.
> >
> > On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter  wrote:
> > >
> > > Nice! -- Let read.table do the work of handling the NA's.
> > > However, even simpler is to use the 'colnames' argument of
> > > read.table() for the column names no?
> > >
> > >   string <- read.table(text = dat$string, fill = TRUE, header =
> > > FALSE, na.strings = "",
> > > col.names = paste0("s", 1:5))
> > >   dat <- cbind(dat, string)
> > >
> > > -- Bert
> > >
> > > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
> > >  wrote:
> > > >
> > > > We can use read.table for a base R solution
> > > >
> > > > string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
> > > > na.strings = "")
> > > > names(string) <- paste0("S", seq_along(string))
> > > > cbind(dat[-3], string)
> > > >
> > > > On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I want to extract new variables from a string and add it to the 
> > > > > dataframe.
> > > > > Sample data is csv file.
> > > > >
> > > > > dat<-read.csv(text="Year, Sex,string
> > > > > 2002,F,15 xc Ab
> > > > > 2003,F,14
> > > > > 2004,M,18 xb 25 35 21
> > > > > 2005,M,13 25
> > > > > 2006,M,14 ac 256 AV 35
> > > > > 2007,F,11",header=TRUE)
> > > > >
> > > > > The string column has  a maximum of five variables. Some rows have all
> > > > > and others may not have all the five variables. If missing then  fill
> > > > > it with NA,
> > > > > Desired result is shown below,
> > > > >
> > > > >
> > > > > Year,Sex,string, S1, S2, S3 S4,S5
> > > > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > > > 2003,F,14, 14,NA,NA,NA,NA
> > > > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > > > 2005,M,13 25,13, 25,NA,NA,NA
> > > > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > > > 2007,F,11, 11,NA,NA,NA,NA
> > > > >
> > > > > Any help?
> > > > > Thank you in advance.
> > > > >
> > > > > __
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > 

Re: [R] Extract

2024-07-21 Thread Bert Gunter
As always, good point.
Here's a piped version of your code for those who are pipe
afficianados. As I'm not very skilled with pipes, it might certainly
be improved.
dat <-
  dat$string |>
 read.table( text = _, fill = TRUE, header = FALSE, na.strings = "")  |>
 (\(x)'names<-'(x,paste0("s", seq_along(x() |>
 (\(x)cbind(dat, x))()

-- Bert


On Sun, Jul 21, 2024 at 11:30 AM Gabor Grothendieck
 wrote:
>
> Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and
> we may not want to do that.  If there are only 3 fields in string, at the 
> most,
> we may wish to generate only 3 columns.
>
> On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter  wrote:
> >
> > Nice! -- Let read.table do the work of handling the NA's.
> > However, even simpler is to use the 'colnames' argument of
> > read.table() for the column names no?
> >
> >   string <- read.table(text = dat$string, fill = TRUE, header =
> > FALSE, na.strings = "",
> > col.names = paste0("s", 1:5))
> >   dat <- cbind(dat, string)
> >
> > -- Bert
> >
> > On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
> >  wrote:
> > >
> > > We can use read.table for a base R solution
> > >
> > > string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
> > > na.strings = "")
> > > names(string) <- paste0("S", seq_along(string))
> > > cbind(dat[-3], string)
> > >
> > > On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I want to extract new variables from a string and add it to the 
> > > > dataframe.
> > > > Sample data is csv file.
> > > >
> > > > dat<-read.csv(text="Year, Sex,string
> > > > 2002,F,15 xc Ab
> > > > 2003,F,14
> > > > 2004,M,18 xb 25 35 21
> > > > 2005,M,13 25
> > > > 2006,M,14 ac 256 AV 35
> > > > 2007,F,11",header=TRUE)
> > > >
> > > > The string column has  a maximum of five variables. Some rows have all
> > > > and others may not have all the five variables. If missing then  fill
> > > > it with NA,
> > > > Desired result is shown below,
> > > >
> > > >
> > > > Year,Sex,string, S1, S2, S3 S4,S5
> > > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > > 2003,F,14, 14,NA,NA,NA,NA
> > > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > > 2005,M,13 25,13, 25,NA,NA,NA
> > > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > > 2007,F,11, 11,NA,NA,NA,NA
> > > >
> > > > Any help?
> > > > Thank you in advance.
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide 
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> > > --
> > > Statistics & Software Consulting
> > > GKX Group, GKX Associates Inc.
> > > tel: 1-877-GKX-GROUP
> > > email: ggrothendieck at gmail.com
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-21 Thread Gabor Grothendieck
Fixing col.names=paste0("S", 1:5) assumes that there will be 5 columns and
we may not want to do that.  If there are only 3 fields in string, at the most,
we may wish to generate only 3 columns.

On Sun, Jul 21, 2024 at 2:20 PM Bert Gunter  wrote:
>
> Nice! -- Let read.table do the work of handling the NA's.
> However, even simpler is to use the 'colnames' argument of
> read.table() for the column names no?
>
>   string <- read.table(text = dat$string, fill = TRUE, header =
> FALSE, na.strings = "",
> col.names = paste0("s", 1:5))
>   dat <- cbind(dat, string)
>
> -- Bert
>
> On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
>  wrote:
> >
> > We can use read.table for a base R solution
> >
> > string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
> > na.strings = "")
> > names(string) <- paste0("S", seq_along(string))
> > cbind(dat[-3], string)
> >
> > On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> > >
> > > Hi All,
> > >
> > > I want to extract new variables from a string and add it to the dataframe.
> > > Sample data is csv file.
> > >
> > > dat<-read.csv(text="Year, Sex,string
> > > 2002,F,15 xc Ab
> > > 2003,F,14
> > > 2004,M,18 xb 25 35 21
> > > 2005,M,13 25
> > > 2006,M,14 ac 256 AV 35
> > > 2007,F,11",header=TRUE)
> > >
> > > The string column has  a maximum of five variables. Some rows have all
> > > and others may not have all the five variables. If missing then  fill
> > > it with NA,
> > > Desired result is shown below,
> > >
> > >
> > > Year,Sex,string, S1, S2, S3 S4,S5
> > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > 2003,F,14, 14,NA,NA,NA,NA
> > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > 2005,M,13 25,13, 25,NA,NA,NA
> > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > 2007,F,11, 11,NA,NA,NA,NA
> > >
> > > Any help?
> > > Thank you in advance.
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-21 Thread Bert Gunter
Nice! -- Let read.table do the work of handling the NA's.
However, even simpler is to use the 'colnames' argument of
read.table() for the column names no?

  string <- read.table(text = dat$string, fill = TRUE, header =
FALSE, na.strings = "",
col.names = paste0("s", 1:5))
  dat <- cbind(dat, string)

-- Bert

On Sun, Jul 21, 2024 at 10:16 AM Gabor Grothendieck
 wrote:
>
> We can use read.table for a base R solution
>
> string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
> na.strings = "")
> names(string) <- paste0("S", seq_along(string))
> cbind(dat[-3], string)
>
> On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
> >
> > Hi All,
> >
> > I want to extract new variables from a string and add it to the dataframe.
> > Sample data is csv file.
> >
> > dat<-read.csv(text="Year, Sex,string
> > 2002,F,15 xc Ab
> > 2003,F,14
> > 2004,M,18 xb 25 35 21
> > 2005,M,13 25
> > 2006,M,14 ac 256 AV 35
> > 2007,F,11",header=TRUE)
> >
> > The string column has  a maximum of five variables. Some rows have all
> > and others may not have all the five variables. If missing then  fill
> > it with NA,
> > Desired result is shown below,
> >
> >
> > Year,Sex,string, S1, S2, S3 S4,S5
> > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > 2003,F,14, 14,NA,NA,NA,NA
> > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > 2005,M,13 25,13, 25,NA,NA,NA
> > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > 2007,F,11, 11,NA,NA,NA,NA
> >
> > Any help?
> > Thank you in advance.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-21 Thread Bert Gunter
I get no error. Please show the entirety of the code you used that
produced the error. Also, are you using a current R version? I am, and
if you are not, there might have been changes from your version to
mine that caused the error.

However, as you were already given satisfactory solutions before, and
Gabor has provided you a simple non-piped base R version that is
probably better anyway, feel free to ignore my request as a waste of
your time.

-- Bert

On Sun, Jul 21, 2024 at 9:36 AM Val  wrote:
>
> Thank   you Bert!
> However, the last line of the script.
>
> dat |> names() |> _[4:8] <- paste0("s", 1:5)
>
> is giving me an error as shown below
> Error: pipe placeholder can only be used as a named argument
>
> Thank you!
>
> On Sat, Jul 20, 2024 at 7:41 PM Bert Gunter  wrote:
> >
> > Val:
> > I wanted to add here a base R solution to your problem that I realize
> > you can happily ignore. However, in the course of puzzling over how to
> > do it using the R native pipe syntax ("|>") , I learned some new stuff
> > that I thought others might find useful, and it seemed sensible to
> > keep the code with this thread for comparison.
> >
> >  I want to acknowledge that in the course of my labor, I posted a
> > query to R-Help to which Iris Simmons posted a very clever answer that
> > I would never have figured out myself and that is used below at the
> > end to change a subset of the names of the modified data frame via a
> > pipe.
> >
> > Here's the whole solution starting from your (excellent!) example dat:
> >
> >dat <- dat$string |>
> >   strsplit(" ") |>
> >   sapply(FUN = \(x)c(x, rep(NA, 5 - length(x |>
> >   t() |> cbind(dat, ..2 = _)
> >
> >## And Iris's trick for changing a subset of attributes, i.e. the
> > "names", in a pipe
> >dat |> names() |> _[4:8] <- paste0("s", 1:5)
> >
> > ## and here's the result:
> > > dat
> >   Year Sex  string s1   s2   s3   s4   s5
> > 1 2002   F15 xc Ab 15   xc   Ab  
> > 2 2003   F  14 14
> > 3 2004   M  18 xb 25 35 21 18   xb   25   35   21
> > 4 2005   M   13 25 13   25   
> > 5 2006   M 14 ac 256 AV 35 14   ac  256   AV   35
> > 6 2007   F  11 11
> >
> > As I noted previously, all columns beyond Sex are character
> >
> > Cheers,
> > Bert
> >
> >
> > On Fri, Jul 19, 2024 at 12:26 PM Val  wrote:
> > >
> > > Thank you Jeff and Bert for your help!
> > > The components of the string  could be nixed (i.e,  numeric, character
> > > or date). Once that is splitted it would be easy for me to format it
> > > accordingly.
> > >
> > > On Fri, Jul 19, 2024 at 2:10 PM Bert Gunter  
> > > wrote:
> > > >
> > > > I did not look closely at the solutions that you were offered, but
> > > > note that you did not specify in your post whether the numbers in your
> > > > string were to be character or numeric variables after they are broken
> > > > out into their own columns. I believe that they are character in the
> > > > solutions, but you should check this. If you want them as numeric,
> > > > e.g., for further processing, you will need to convert them. Or
> > > > vice-versa.
> > > >
> > > > Bert
> > > >
> > > >
> > > > On Fri, Jul 19, 2024 at 9:52 AM Val  wrote:
> > > > >
> > > > > Hi All,
> > > > >
> > > > > I want to extract new variables from a string and add it to the 
> > > > > dataframe.
> > > > > Sample data is csv file.
> > > > >
> > > > > dat<-read.csv(text="Year, Sex,string
> > > > > 2002,F,15 xc Ab
> > > > > 2003,F,14
> > > > > 2004,M,18 xb 25 35 21
> > > > > 2005,M,13 25
> > > > > 2006,M,14 ac 256 AV 35
> > > > > 2007,F,11",header=TRUE)
> > > > >
> > > > > The string column has  a maximum of five variables. Some rows have all
> > > > > and others may not have all the five variables. If missing then  fill
> > > > > it with NA,
> > > > > Desired result is shown below,
> > > > >
> > > > >
> > > > > Year,Sex,string, S1, S2, S3 S4,S5
> > > > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > > > 2003,F,14, 14,NA,NA,NA,NA
> > > > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > > > 2005,M,13 25,13, 25,NA,NA,NA
> > > > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > > > 2007,F,11, 11,NA,NA,NA,NA
> > > > >
> > > > > Any help?
> > > > > Thank you in advance.
> > > > >
> > > > > __
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide 
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-21 Thread Gabor Grothendieck
We can use read.table for a base R solution

string <- read.table(text = dat$string, fill = TRUE, header = FALSE,
na.strings = "")
names(string) <- paste0("S", seq_along(string))
cbind(dat[-3], string)

On Fri, Jul 19, 2024 at 12:52 PM Val  wrote:
>
> Hi All,
>
> I want to extract new variables from a string and add it to the dataframe.
> Sample data is csv file.
>
> dat<-read.csv(text="Year, Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11",header=TRUE)
>
> The string column has  a maximum of five variables. Some rows have all
> and others may not have all the five variables. If missing then  fill
> it with NA,
> Desired result is shown below,
>
>
> Year,Sex,string, S1, S2, S3 S4,S5
> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> 2003,F,14, 14,NA,NA,NA,NA
> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> 2005,M,13 25,13, 25,NA,NA,NA
> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> 2007,F,11, 11,NA,NA,NA,NA
>
> Any help?
> Thank you in advance.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Gabor Grothendieck
That was supposed to be

  z |> list(x = _) |> within(names(x) <- replace(names(x), 2, "foo")) |> _$x

but I really see no advantage over

  z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x

Regarding the z |> names() |> _[2] <- "foo" idiom, while it is clever,
and well illustrates
what is possible with base R pipes that might not be initially expected,
I think it should be discouraged as not in the spirit of pipes which is a more
functional approach to programming.  Such an approach ought to be
non-destructive and should pass on the result to the next
step in the pipeline.  These criteria are not satisfied by it.

On Sun, Jul 21, 2024 at 11:17 AM Gabor Grothendieck
 wrote:
>
> If you object to names(x)[2]<- ... then use replace:
>
>   z |> list(x = _) |> within(replace(names(x), 2, "foo")) |> _$x
>
> On Sun, Jul 21, 2024 at 11:10 AM Bert Gunter  wrote:
> >
> > hmmm...
> > But note that you still used the nested assignment, names()[2] <-
> > "foo", to circumvent R's pipe limitations, which is exactly what
> > Iris's solution avoids. So I think I was overawed by your cleverness
> > ;-)
> >
> > Best,
> > Bert
> >
> >
> > On Sun, Jul 21, 2024 at 8:01 AM Bert Gunter  wrote:
> > >
> > > Wow!
> > > Yes, this is very clever -- way too clever for me -- and meets my
> > > criteria for a solution.
> > >
> > > I think it's also another piece of evidence of why piping in base R is
> > > not suited for complex/nested assignments, as discussed in Deepayan's
> > > response.
> > >
> > > Maybe someone could offer a better Tidydata piping solution just for
> > > completeness?
> > >
> > > Best,
> > > Bert
> > >
> > > On Sun, Jul 21, 2024 at 7:48 AM Gabor Grothendieck
> > >  wrote:
> > > >
> > > > This
> > > > - is non-destructive (does not change z)
> > > > - passes the renamed z onto further pipe legs
> > > > - does not use \(x)...
> > > >
> > > > It works by boxing z, operating on the boxed version and then unboxing 
> > > > it.
> > > >
> > > >   z <- data.frame(a = 1:3, b = letters[1:3])
> > > >   z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
> > > >   ##   a foo
> > > >   ## 1 1   a
> > > >   ## 2 2   b
> > > >   ## 3 3   c
> > > >
> > > > On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter  
> > > > wrote:
> > > > >
> > > > > This post is likely pretty useless;  it is motivated by a recent post
> > > > > from "Val" that was elegantly answered using Tidyverse constructs, but
> > > > > I wondered how to do it using base R only. Along the way, I ran into
> > > > > the following question to which I think my answer (below) is pretty
> > > > > awful. I would be interested in more elegant base R approaches. So...
> > > > >
> > > > > z <- data.frame(a = 1:3, b = letters[1:3])
> > > > > > z
> > > > >   a h
> > > > > 1 1 a
> > > > > 2 2 b
> > > > > 3 3 c
> > > > >
> > > > > Suppose I want to change the name of the second column of z from 'b'
> > > > > to 'foo' . This is very easy using nested function syntax by:
> > > > >
> > > > > names(z)[2] <- "foo"
> > > > > > z
> > > > >   a foo
> > > > > 1 1   a
> > > > > 2 2   b
> > > > > 3 3   c
> > > > >
> > > > > Now suppose I wanted to do this using |> syntax, along the lines of:
> > > > >
> > > > > z |> names()[2] <- "foo"  ## throws an error
> > > > >
> > > > > Slightly fancier is:
> > > > >
> > > > > z |> (\(x)names(x)[2] <- "b")()
> > > > > ## does nothing, but does not throw an error.
> > > > >
> > > > > However, the following, which resulted from a more careful read of
> > > > > ?names works (after changing the name of the second column back to "b"
> > > > > of course):
> > > > >
> > > > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > > > >z
> > > > >   a foo
> > > > > 1 1   a
> > > > > 2 2   b
> > > > > 3 3   c
> > > > >
> > > > > This qualifies to me as "pretty awful." I'm sure there are better ways
> > > > > to do this using pipe syntax, so I would appreciate any better
> > > > > approaches.
> > > > >
> > > > > Best,
> > > > > Bert
> > > > >
> > > > > __
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide 
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > > >
> > > >
> > > > --
> > > > Statistics & Software Consulting
> > > > GKX Group, GKX Associates Inc.
> > > > tel: 1-877-GKX-GROUP
> > > > email: ggrothendieck at gmail.com
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 

Re: [R] Extract

2024-07-21 Thread Val
Thank   you Bert!
However, the last line of the script.

dat |> names() |> _[4:8] <- paste0("s", 1:5)

is giving me an error as shown below
Error: pipe placeholder can only be used as a named argument

Thank you!

On Sat, Jul 20, 2024 at 7:41 PM Bert Gunter  wrote:
>
> Val:
> I wanted to add here a base R solution to your problem that I realize
> you can happily ignore. However, in the course of puzzling over how to
> do it using the R native pipe syntax ("|>") , I learned some new stuff
> that I thought others might find useful, and it seemed sensible to
> keep the code with this thread for comparison.
>
>  I want to acknowledge that in the course of my labor, I posted a
> query to R-Help to which Iris Simmons posted a very clever answer that
> I would never have figured out myself and that is used below at the
> end to change a subset of the names of the modified data frame via a
> pipe.
>
> Here's the whole solution starting from your (excellent!) example dat:
>
>dat <- dat$string |>
>   strsplit(" ") |>
>   sapply(FUN = \(x)c(x, rep(NA, 5 - length(x |>
>   t() |> cbind(dat, ..2 = _)
>
>## And Iris's trick for changing a subset of attributes, i.e. the
> "names", in a pipe
>dat |> names() |> _[4:8] <- paste0("s", 1:5)
>
> ## and here's the result:
> > dat
>   Year Sex  string s1   s2   s3   s4   s5
> 1 2002   F15 xc Ab 15   xc   Ab  
> 2 2003   F  14 14
> 3 2004   M  18 xb 25 35 21 18   xb   25   35   21
> 4 2005   M   13 25 13   25   
> 5 2006   M 14 ac 256 AV 35 14   ac  256   AV   35
> 6 2007   F  11 11
>
> As I noted previously, all columns beyond Sex are character
>
> Cheers,
> Bert
>
>
> On Fri, Jul 19, 2024 at 12:26 PM Val  wrote:
> >
> > Thank you Jeff and Bert for your help!
> > The components of the string  could be nixed (i.e,  numeric, character
> > or date). Once that is splitted it would be easy for me to format it
> > accordingly.
> >
> > On Fri, Jul 19, 2024 at 2:10 PM Bert Gunter  wrote:
> > >
> > > I did not look closely at the solutions that you were offered, but
> > > note that you did not specify in your post whether the numbers in your
> > > string were to be character or numeric variables after they are broken
> > > out into their own columns. I believe that they are character in the
> > > solutions, but you should check this. If you want them as numeric,
> > > e.g., for further processing, you will need to convert them. Or
> > > vice-versa.
> > >
> > > Bert
> > >
> > >
> > > On Fri, Jul 19, 2024 at 9:52 AM Val  wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I want to extract new variables from a string and add it to the 
> > > > dataframe.
> > > > Sample data is csv file.
> > > >
> > > > dat<-read.csv(text="Year, Sex,string
> > > > 2002,F,15 xc Ab
> > > > 2003,F,14
> > > > 2004,M,18 xb 25 35 21
> > > > 2005,M,13 25
> > > > 2006,M,14 ac 256 AV 35
> > > > 2007,F,11",header=TRUE)
> > > >
> > > > The string column has  a maximum of five variables. Some rows have all
> > > > and others may not have all the five variables. If missing then  fill
> > > > it with NA,
> > > > Desired result is shown below,
> > > >
> > > >
> > > > Year,Sex,string, S1, S2, S3 S4,S5
> > > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > > 2003,F,14, 14,NA,NA,NA,NA
> > > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > > 2005,M,13 25,13, 25,NA,NA,NA
> > > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > > 2007,F,11, 11,NA,NA,NA,NA
> > > >
> > > > Any help?
> > > > Thank you in advance.
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide 
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread CALUM POLWART
The tidy solution is rename

literally:

z |> rename(foo = 2)

Or you could do it with other functions

z |> select ( 1, foo = 2)

Or

z |> mutate( foo = 2 ) |> # untested (always worry that makes the whole
column 2)
select (-2)

But that's akin to

z$foo <- z[2]
z[2] <- null

On Sun, 21 Jul 2024, 16:01 Bert Gunter,  wrote:

> Wow!
> Yes, this is very clever -- way too clever for me -- and meets my
> criteria for a solution.
>
> I think it's also another piece of evidence of why piping in base R is
> not suited for complex/nested assignments, as discussed in Deepayan's
> response.
>
> Maybe someone could offer a better Tidydata piping solution just for
> completeness?
>
> Best,
> Bert
>
> On Sun, Jul 21, 2024 at 7:48 AM Gabor Grothendieck
>  wrote:
> >
> > This
> > - is non-destructive (does not change z)
> > - passes the renamed z onto further pipe legs
> > - does not use \(x)...
> >
> > It works by boxing z, operating on the boxed version and then unboxing
> it.
> >
> >   z <- data.frame(a = 1:3, b = letters[1:3])
> >   z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
> >   ##   a foo
> >   ## 1 1   a
> >   ## 2 2   b
> >   ## 3 3   c
> >
> > On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter 
> wrote:
> > >
> > > This post is likely pretty useless;  it is motivated by a recent post
> > > from "Val" that was elegantly answered using Tidyverse constructs, but
> > > I wondered how to do it using base R only. Along the way, I ran into
> > > the following question to which I think my answer (below) is pretty
> > > awful. I would be interested in more elegant base R approaches. So...
> > >
> > > z <- data.frame(a = 1:3, b = letters[1:3])
> > > > z
> > >   a h
> > > 1 1 a
> > > 2 2 b
> > > 3 3 c
> > >
> > > Suppose I want to change the name of the second column of z from 'b'
> > > to 'foo' . This is very easy using nested function syntax by:
> > >
> > > names(z)[2] <- "foo"
> > > > z
> > >   a foo
> > > 1 1   a
> > > 2 2   b
> > > 3 3   c
> > >
> > > Now suppose I wanted to do this using |> syntax, along the lines of:
> > >
> > > z |> names()[2] <- "foo"  ## throws an error
> > >
> > > Slightly fancier is:
> > >
> > > z |> (\(x)names(x)[2] <- "b")()
> > > ## does nothing, but does not throw an error.
> > >
> > > However, the following, which resulted from a more careful read of
> > > ?names works (after changing the name of the second column back to "b"
> > > of course):
> > >
> > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > >z
> > >   a foo
> > > 1 1   a
> > > 2 2   b
> > > 3 3   c
> > >
> > > This qualifies to me as "pretty awful." I'm sure there are better ways
> > > to do this using pipe syntax, so I would appreciate any better
> > > approaches.
> > >
> > > Best,
> > > Bert
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Bert Gunter
Thanks, Calum.

That was exactly what Duncan Murdoch proposed earlier in this thread,
except, of course, he had to explicitly write the function first.

-- Bert

On Sun, Jul 21, 2024 at 8:12 AM CALUM POLWART  wrote:
>
> The tidy solution is rename
>
> literally:
>
> z |> rename(foo = 2)
>
> Or you could do it with other functions
>
> z |> select ( 1, foo = 2)
>
> Or
>
> z |> mutate( foo = 2 ) |> # untested (always worry that makes the whole 
> column 2)
> select (-2)
>
> But that's akin to
>
> z$foo <- z[2]
> z[2] <- null
>
>
> On Sun, 21 Jul 2024, 16:01 Bert Gunter,  wrote:
>>
>> Wow!
>> Yes, this is very clever -- way too clever for me -- and meets my
>> criteria for a solution.
>>
>> I think it's also another piece of evidence of why piping in base R is
>> not suited for complex/nested assignments, as discussed in Deepayan's
>> response.
>>
>> Maybe someone could offer a better Tidydata piping solution just for
>> completeness?
>>
>> Best,
>> Bert
>>
>> On Sun, Jul 21, 2024 at 7:48 AM Gabor Grothendieck
>>  wrote:
>> >
>> > This
>> > - is non-destructive (does not change z)
>> > - passes the renamed z onto further pipe legs
>> > - does not use \(x)...
>> >
>> > It works by boxing z, operating on the boxed version and then unboxing it.
>> >
>> >   z <- data.frame(a = 1:3, b = letters[1:3])
>> >   z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
>> >   ##   a foo
>> >   ## 1 1   a
>> >   ## 2 2   b
>> >   ## 3 3   c
>> >
>> > On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter  wrote:
>> > >
>> > > This post is likely pretty useless;  it is motivated by a recent post
>> > > from "Val" that was elegantly answered using Tidyverse constructs, but
>> > > I wondered how to do it using base R only. Along the way, I ran into
>> > > the following question to which I think my answer (below) is pretty
>> > > awful. I would be interested in more elegant base R approaches. So...
>> > >
>> > > z <- data.frame(a = 1:3, b = letters[1:3])
>> > > > z
>> > >   a h
>> > > 1 1 a
>> > > 2 2 b
>> > > 3 3 c
>> > >
>> > > Suppose I want to change the name of the second column of z from 'b'
>> > > to 'foo' . This is very easy using nested function syntax by:
>> > >
>> > > names(z)[2] <- "foo"
>> > > > z
>> > >   a foo
>> > > 1 1   a
>> > > 2 2   b
>> > > 3 3   c
>> > >
>> > > Now suppose I wanted to do this using |> syntax, along the lines of:
>> > >
>> > > z |> names()[2] <- "foo"  ## throws an error
>> > >
>> > > Slightly fancier is:
>> > >
>> > > z |> (\(x)names(x)[2] <- "b")()
>> > > ## does nothing, but does not throw an error.
>> > >
>> > > However, the following, which resulted from a more careful read of
>> > > ?names works (after changing the name of the second column back to "b"
>> > > of course):
>> > >
>> > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
>> > > >z
>> > >   a foo
>> > > 1 1   a
>> > > 2 2   b
>> > > 3 3   c
>> > >
>> > > This qualifies to me as "pretty awful." I'm sure there are better ways
>> > > to do this using pipe syntax, so I would appreciate any better
>> > > approaches.
>> > >
>> > > Best,
>> > > Bert
>> > >
>> > > __
>> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide 
>> > > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>> >
>> >
>> > --
>> > Statistics & Software Consulting
>> > GKX Group, GKX Associates Inc.
>> > tel: 1-877-GKX-GROUP
>> > email: ggrothendieck at gmail.com
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Gabor Grothendieck
If you object to names(x)[2]<- ... then use replace:

  z |> list(x = _) |> within(replace(names(x), 2, "foo")) |> _$x

On Sun, Jul 21, 2024 at 11:10 AM Bert Gunter  wrote:
>
> hmmm...
> But note that you still used the nested assignment, names()[2] <-
> "foo", to circumvent R's pipe limitations, which is exactly what
> Iris's solution avoids. So I think I was overawed by your cleverness
> ;-)
>
> Best,
> Bert
>
>
> On Sun, Jul 21, 2024 at 8:01 AM Bert Gunter  wrote:
> >
> > Wow!
> > Yes, this is very clever -- way too clever for me -- and meets my
> > criteria for a solution.
> >
> > I think it's also another piece of evidence of why piping in base R is
> > not suited for complex/nested assignments, as discussed in Deepayan's
> > response.
> >
> > Maybe someone could offer a better Tidydata piping solution just for
> > completeness?
> >
> > Best,
> > Bert
> >
> > On Sun, Jul 21, 2024 at 7:48 AM Gabor Grothendieck
> >  wrote:
> > >
> > > This
> > > - is non-destructive (does not change z)
> > > - passes the renamed z onto further pipe legs
> > > - does not use \(x)...
> > >
> > > It works by boxing z, operating on the boxed version and then unboxing it.
> > >
> > >   z <- data.frame(a = 1:3, b = letters[1:3])
> > >   z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
> > >   ##   a foo
> > >   ## 1 1   a
> > >   ## 2 2   b
> > >   ## 3 3   c
> > >
> > > On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter  
> > > wrote:
> > > >
> > > > This post is likely pretty useless;  it is motivated by a recent post
> > > > from "Val" that was elegantly answered using Tidyverse constructs, but
> > > > I wondered how to do it using base R only. Along the way, I ran into
> > > > the following question to which I think my answer (below) is pretty
> > > > awful. I would be interested in more elegant base R approaches. So...
> > > >
> > > > z <- data.frame(a = 1:3, b = letters[1:3])
> > > > > z
> > > >   a h
> > > > 1 1 a
> > > > 2 2 b
> > > > 3 3 c
> > > >
> > > > Suppose I want to change the name of the second column of z from 'b'
> > > > to 'foo' . This is very easy using nested function syntax by:
> > > >
> > > > names(z)[2] <- "foo"
> > > > > z
> > > >   a foo
> > > > 1 1   a
> > > > 2 2   b
> > > > 3 3   c
> > > >
> > > > Now suppose I wanted to do this using |> syntax, along the lines of:
> > > >
> > > > z |> names()[2] <- "foo"  ## throws an error
> > > >
> > > > Slightly fancier is:
> > > >
> > > > z |> (\(x)names(x)[2] <- "b")()
> > > > ## does nothing, but does not throw an error.
> > > >
> > > > However, the following, which resulted from a more careful read of
> > > > ?names works (after changing the name of the second column back to "b"
> > > > of course):
> > > >
> > > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > > >z
> > > >   a foo
> > > > 1 1   a
> > > > 2 2   b
> > > > 3 3   c
> > > >
> > > > This qualifies to me as "pretty awful." I'm sure there are better ways
> > > > to do this using pipe syntax, so I would appreciate any better
> > > > approaches.
> > > >
> > > > Best,
> > > > Bert
> > > >
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide 
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> > > --
> > > Statistics & Software Consulting
> > > GKX Group, GKX Associates Inc.
> > > tel: 1-877-GKX-GROUP
> > > email: ggrothendieck at gmail.com



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Bert Gunter
hmmm...
But note that you still used the nested assignment, names()[2] <-
"foo", to circumvent R's pipe limitations, which is exactly what
Iris's solution avoids. So I think I was overawed by your cleverness
;-)

Best,
Bert


On Sun, Jul 21, 2024 at 8:01 AM Bert Gunter  wrote:
>
> Wow!
> Yes, this is very clever -- way too clever for me -- and meets my
> criteria for a solution.
>
> I think it's also another piece of evidence of why piping in base R is
> not suited for complex/nested assignments, as discussed in Deepayan's
> response.
>
> Maybe someone could offer a better Tidydata piping solution just for
> completeness?
>
> Best,
> Bert
>
> On Sun, Jul 21, 2024 at 7:48 AM Gabor Grothendieck
>  wrote:
> >
> > This
> > - is non-destructive (does not change z)
> > - passes the renamed z onto further pipe legs
> > - does not use \(x)...
> >
> > It works by boxing z, operating on the boxed version and then unboxing it.
> >
> >   z <- data.frame(a = 1:3, b = letters[1:3])
> >   z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
> >   ##   a foo
> >   ## 1 1   a
> >   ## 2 2   b
> >   ## 3 3   c
> >
> > On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter  wrote:
> > >
> > > This post is likely pretty useless;  it is motivated by a recent post
> > > from "Val" that was elegantly answered using Tidyverse constructs, but
> > > I wondered how to do it using base R only. Along the way, I ran into
> > > the following question to which I think my answer (below) is pretty
> > > awful. I would be interested in more elegant base R approaches. So...
> > >
> > > z <- data.frame(a = 1:3, b = letters[1:3])
> > > > z
> > >   a h
> > > 1 1 a
> > > 2 2 b
> > > 3 3 c
> > >
> > > Suppose I want to change the name of the second column of z from 'b'
> > > to 'foo' . This is very easy using nested function syntax by:
> > >
> > > names(z)[2] <- "foo"
> > > > z
> > >   a foo
> > > 1 1   a
> > > 2 2   b
> > > 3 3   c
> > >
> > > Now suppose I wanted to do this using |> syntax, along the lines of:
> > >
> > > z |> names()[2] <- "foo"  ## throws an error
> > >
> > > Slightly fancier is:
> > >
> > > z |> (\(x)names(x)[2] <- "b")()
> > > ## does nothing, but does not throw an error.
> > >
> > > However, the following, which resulted from a more careful read of
> > > ?names works (after changing the name of the second column back to "b"
> > > of course):
> > >
> > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > >z
> > >   a foo
> > > 1 1   a
> > > 2 2   b
> > > 3 3   c
> > >
> > > This qualifies to me as "pretty awful." I'm sure there are better ways
> > > to do this using pipe syntax, so I would appreciate any better
> > > approaches.
> > >
> > > Best,
> > > Bert
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> > --
> > Statistics & Software Consulting
> > GKX Group, GKX Associates Inc.
> > tel: 1-877-GKX-GROUP
> > email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Bert Gunter
Wow!
Yes, this is very clever -- way too clever for me -- and meets my
criteria for a solution.

I think it's also another piece of evidence of why piping in base R is
not suited for complex/nested assignments, as discussed in Deepayan's
response.

Maybe someone could offer a better Tidydata piping solution just for
completeness?

Best,
Bert

On Sun, Jul 21, 2024 at 7:48 AM Gabor Grothendieck
 wrote:
>
> This
> - is non-destructive (does not change z)
> - passes the renamed z onto further pipe legs
> - does not use \(x)...
>
> It works by boxing z, operating on the boxed version and then unboxing it.
>
>   z <- data.frame(a = 1:3, b = letters[1:3])
>   z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
>   ##   a foo
>   ## 1 1   a
>   ## 2 2   b
>   ## 3 3   c
>
> On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter  wrote:
> >
> > This post is likely pretty useless;  it is motivated by a recent post
> > from "Val" that was elegantly answered using Tidyverse constructs, but
> > I wondered how to do it using base R only. Along the way, I ran into
> > the following question to which I think my answer (below) is pretty
> > awful. I would be interested in more elegant base R approaches. So...
> >
> > z <- data.frame(a = 1:3, b = letters[1:3])
> > > z
> >   a h
> > 1 1 a
> > 2 2 b
> > 3 3 c
> >
> > Suppose I want to change the name of the second column of z from 'b'
> > to 'foo' . This is very easy using nested function syntax by:
> >
> > names(z)[2] <- "foo"
> > > z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > Now suppose I wanted to do this using |> syntax, along the lines of:
> >
> > z |> names()[2] <- "foo"  ## throws an error
> >
> > Slightly fancier is:
> >
> > z |> (\(x)names(x)[2] <- "b")()
> > ## does nothing, but does not throw an error.
> >
> > However, the following, which resulted from a more careful read of
> > ?names works (after changing the name of the second column back to "b"
> > of course):
> >
> > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > >z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > This qualifies to me as "pretty awful." I'm sure there are better ways
> > to do this using pipe syntax, so I would appreciate any better
> > approaches.
> >
> > Best,
> > Bert
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Gabor Grothendieck
This
- is non-destructive (does not change z)
- passes the renamed z onto further pipe legs
- does not use \(x)...

It works by boxing z, operating on the boxed version and then unboxing it.

  z <- data.frame(a = 1:3, b = letters[1:3])
  z |> list(x = _) |> within(names(x)[2] <- "foo") |> _$x
  ##   a foo
  ## 1 1   a
  ## 2 2   b
  ## 3 3   c

On Sat, Jul 20, 2024 at 4:07 PM Bert Gunter  wrote:
>
> This post is likely pretty useless;  it is motivated by a recent post
> from "Val" that was elegantly answered using Tidyverse constructs, but
> I wondered how to do it using base R only. Along the way, I ran into
> the following question to which I think my answer (below) is pretty
> awful. I would be interested in more elegant base R approaches. So...
>
> z <- data.frame(a = 1:3, b = letters[1:3])
> > z
>   a h
> 1 1 a
> 2 2 b
> 3 3 c
>
> Suppose I want to change the name of the second column of z from 'b'
> to 'foo' . This is very easy using nested function syntax by:
>
> names(z)[2] <- "foo"
> > z
>   a foo
> 1 1   a
> 2 2   b
> 3 3   c
>
> Now suppose I wanted to do this using |> syntax, along the lines of:
>
> z |> names()[2] <- "foo"  ## throws an error
>
> Slightly fancier is:
>
> z |> (\(x)names(x)[2] <- "b")()
> ## does nothing, but does not throw an error.
>
> However, the following, which resulted from a more careful read of
> ?names works (after changing the name of the second column back to "b"
> of course):
>
> z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> >z
>   a foo
> 1 1   a
> 2 2   b
> 3 3   c
>
> This qualifies to me as "pretty awful." I'm sure there are better ways
> to do this using pipe syntax, so I would appreciate any better
> approaches.
>
> Best,
> Bert
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread avi.e.gross
As an intellectual exercise it can be reasonable to discuss these ways to use a 
pipe even in places where it may not have been seen as something anyone would 
even try to use it.

In actual code, it is often better to not make overly cute constructions that 
others (or yourself a month later) will not understand.

Pipes were not part of R originally and the versions created over the years 
have included many kinds of functionality that is not in the new official pipe 
and that might allow functionality such as this. In some ways, the tidyverse 
evolved so that they did not require this game as their normal method of 
changing the names of columns works inline by using verbs such as rename(). 
Other things you can often do in-line is to reorder the columns or apply 
changes selectively to columns whose names or contents match your patterns. 

If you now wanted to use base R to make similar changes in a pipeline, the 
result may be that you end up reinventing extensions such as functions that do 
what you want in a pipeline, OR you realize that many things done in just base 
R should continue being done in more discrete lumps rather than one huge 
pipeline.

There may well already be one or more packages outside the tidyverse that 
provide such extensions but I am not so sure that using them will be as easy or 
convenient or readable until and unless they are as well-known. I note that 
even in the tidyverse, many things are often better done, especially while 
testing the code, without really long pipes so that it is easier to modify and 
rearrange things or see intermediate values. Similar arguments apply to 
something like using ggplot() with its own sort-of piping where it may make 
sense to use repeated invocations of "p <- p + function(args)" so each can 
clearly be documented with comments and sometimes steps can selectively be 
commented out or moved later in the "pipeline" if it seems the changes by one 
step are interfering with a later step by re-setting internal variables. Of 
course, if the code is completely done and not expected to change, you can 
always switch to piping if that is what you want.

Programming languages can have many purposes including things like efficiency 
or compact representations or making it harder to make mistakes but a major 
advantage of some is that the programs be READ easily without having to consult 
gurus or try to debug. Pipes can both be helpful in this regard or be 
absolutely mysterious. Using "_" in any way imaginable as a placeholder is 
convenient and allowing a default of it being a replacement for a first 
argument without specifying it is nice. But you can imagine an implementation 
where you constantly put in ".placeHolder." as clearer.


-Original Message-
From: R-help  On Behalf Of Deepayan Sarkar
Sent: Sunday, July 21, 2024 1:08 AM
To: Bert Gunter 
Cc: R-help 
Subject: Re: [R] [External] Using the pipe, |>, syntax with "names<-"

The main challenge in Bert's original problem is that `[` and `[<-` cannot
be called in a pipeline. The obvious solution is to define named versions,
e.g.:

elt <- `[`
`elt<-` <- `[<-`

Then,

> z <- data.frame(a = 1:3, b = letters[1:3])
> z |> names() |> elt(2)
[1] "b"
> z |> names() |> elt(2) <- "foo"
> z
  a foo
1 1   a
2 2   b
3 3   c

You could actually also do (using a similar function already defined in
methods)

z |> names() |> el(2) <- "bar"

Iris's _ trick is of course a nice alternative; and this example in ?pipeOp
already covers it:

# using the placeholder as the head of an extraction chain:
mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) |> _$coef[[2]]

While the replacement question is a nice exercise, I am not sure about the
value of emphasizing that you can use pipes to do complex assignments.
Doesn't that defeat the whole purpose of piping? For one thing, it will
necessarily terminate the pipe. Also, it will not work if the starting
value is not a variable. E.g.,

> data.frame(a = 1:3, b = letters[1:3]) |> names() |> _[2] <- "bar"
Error in names(data.frame(a = 1:3, b = letters[1:3]))[2] <- "bar" :
  target of assignment expands to non-language object

Duncan's rename() approach, which will just change the column name and
return the modified object, seems more useful as part of a pipeline.

Best,
-Deepayan

On Sun, 21 Jul 2024 at 04:46, Bert Gunter  wrote:

> I second Rich's excellent suggestion.
>
> As with all elegant solutions, Iris's clicked on the wee light bulb in
> my brain, and I realized that a slightly more verbose, but perhaps
> more enlightening, alternative may be:
>
> z |>  attr("names") |> _[2] <- "foo"
>
> However, I would add this as an example *only with* Iris's solution.
> Hers should be shown whether or not the above is.
>
> Cheers,
> Bert
>
> On Sat, Jul 20, 2024 at 3:35 PM Richard M. Heiberger 
> wrote:
> >
> > I think Iris's solution should be added to the help file: ?|>
> > there are no examples there now that show assignment or replacement
> using the "_"
> >
> > > 

Re: [R] [External] Using the pipe, |>, syntax with "names<-"

2024-07-21 Thread Jeff Newmiller via R-help
I think that the simplicity of setNames is hard to beat:

z |> setNames( c( "a", "foo" ) )

and if you are determined not to load dplyr then

column_rename <- function( DF, map ) {
  on <- names( DF )
  on[ match( map, on ) ] <- names( map )
  names( DF ) <- on
  DF
}

is more robust to column reorganization than replace():

z |> column_rename( c( foo = "b" ) )


On July 20, 2024 10:07:57 PM PDT, Deepayan Sarkar  
wrote:
>The main challenge in Bert's original problem is that `[` and `[<-` cannot
>be called in a pipeline. The obvious solution is to define named versions,
>e.g.:
>
>elt <- `[`
>`elt<-` <- `[<-`
>
>Then,
>
>> z <- data.frame(a = 1:3, b = letters[1:3])
>> z |> names() |> elt(2)
>[1] "b"
>> z |> names() |> elt(2) <- "foo"
>> z
>  a foo
>1 1   a
>2 2   b
>3 3   c
>
>You could actually also do (using a similar function already defined in
>methods)
>
>z |> names() |> el(2) <- "bar"
>
>Iris's _ trick is of course a nice alternative; and this example in ?pipeOp
>already covers it:
>
># using the placeholder as the head of an extraction chain:
>mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) |> _$coef[[2]]
>
>While the replacement question is a nice exercise, I am not sure about the
>value of emphasizing that you can use pipes to do complex assignments.
>Doesn't that defeat the whole purpose of piping? For one thing, it will
>necessarily terminate the pipe. Also, it will not work if the starting
>value is not a variable. E.g.,
>
>> data.frame(a = 1:3, b = letters[1:3]) |> names() |> _[2] <- "bar"
>Error in names(data.frame(a = 1:3, b = letters[1:3]))[2] <- "bar" :
>  target of assignment expands to non-language object
>
>Duncan's rename() approach, which will just change the column name and
>return the modified object, seems more useful as part of a pipeline.
>
>Best,
>-Deepayan
>
>On Sun, 21 Jul 2024 at 04:46, Bert Gunter  wrote:
>
>> I second Rich's excellent suggestion.
>>
>> As with all elegant solutions, Iris's clicked on the wee light bulb in
>> my brain, and I realized that a slightly more verbose, but perhaps
>> more enlightening, alternative may be:
>>
>> z |>  attr("names") |> _[2] <- "foo"
>>
>> However, I would add this as an example *only with* Iris's solution.
>> Hers should be shown whether or not the above is.
>>
>> Cheers,
>> Bert
>>
>> On Sat, Jul 20, 2024 at 3:35 PM Richard M. Heiberger 
>> wrote:
>> >
>> > I think Iris's solution should be added to the help file: ?|>
>> > there are no examples there now that show assignment or replacement
>> using the "_"
>> >
>> > > On Jul 20, 2024, at 18:21, Duncan Murdoch 
>> wrote:
>> > >
>> > > On 2024-07-20 6:02 p.m., Iris Simmons wrote:
>> > >> z <- data.frame(a = 1:3, b = letters[1:3])
>> > >> z |> names() |> _[2] <- "foo"
>> > >> z
>> > >
>> > > That's a great suggestion!
>> > >
>> > > Duncan Murdoch
>> > >
>> > > __
>> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> http://www.r-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> >
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Deepayan Sarkar
The main challenge in Bert's original problem is that `[` and `[<-` cannot
be called in a pipeline. The obvious solution is to define named versions,
e.g.:

elt <- `[`
`elt<-` <- `[<-`

Then,

> z <- data.frame(a = 1:3, b = letters[1:3])
> z |> names() |> elt(2)
[1] "b"
> z |> names() |> elt(2) <- "foo"
> z
  a foo
1 1   a
2 2   b
3 3   c

You could actually also do (using a similar function already defined in
methods)

z |> names() |> el(2) <- "bar"

Iris's _ trick is of course a nice alternative; and this example in ?pipeOp
already covers it:

# using the placeholder as the head of an extraction chain:
mtcars |> subset(cyl == 4) |> lm(formula = mpg ~ disp) |> _$coef[[2]]

While the replacement question is a nice exercise, I am not sure about the
value of emphasizing that you can use pipes to do complex assignments.
Doesn't that defeat the whole purpose of piping? For one thing, it will
necessarily terminate the pipe. Also, it will not work if the starting
value is not a variable. E.g.,

> data.frame(a = 1:3, b = letters[1:3]) |> names() |> _[2] <- "bar"
Error in names(data.frame(a = 1:3, b = letters[1:3]))[2] <- "bar" :
  target of assignment expands to non-language object

Duncan's rename() approach, which will just change the column name and
return the modified object, seems more useful as part of a pipeline.

Best,
-Deepayan

On Sun, 21 Jul 2024 at 04:46, Bert Gunter  wrote:

> I second Rich's excellent suggestion.
>
> As with all elegant solutions, Iris's clicked on the wee light bulb in
> my brain, and I realized that a slightly more verbose, but perhaps
> more enlightening, alternative may be:
>
> z |>  attr("names") |> _[2] <- "foo"
>
> However, I would add this as an example *only with* Iris's solution.
> Hers should be shown whether or not the above is.
>
> Cheers,
> Bert
>
> On Sat, Jul 20, 2024 at 3:35 PM Richard M. Heiberger 
> wrote:
> >
> > I think Iris's solution should be added to the help file: ?|>
> > there are no examples there now that show assignment or replacement
> using the "_"
> >
> > > On Jul 20, 2024, at 18:21, Duncan Murdoch 
> wrote:
> > >
> > > On 2024-07-20 6:02 p.m., Iris Simmons wrote:
> > >> z <- data.frame(a = 1:3, b = letters[1:3])
> > >> z |> names() |> _[2] <- "foo"
> > >> z
> > >
> > > That's a great suggestion!
> > >
> > > Duncan Murdoch
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.r-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-20 Thread Bert Gunter
Val:
I wanted to add here a base R solution to your problem that I realize
you can happily ignore. However, in the course of puzzling over how to
do it using the R native pipe syntax ("|>") , I learned some new stuff
that I thought others might find useful, and it seemed sensible to
keep the code with this thread for comparison.

 I want to acknowledge that in the course of my labor, I posted a
query to R-Help to which Iris Simmons posted a very clever answer that
I would never have figured out myself and that is used below at the
end to change a subset of the names of the modified data frame via a
pipe.

Here's the whole solution starting from your (excellent!) example dat:

   dat <- dat$string |>
  strsplit(" ") |>
  sapply(FUN = \(x)c(x, rep(NA, 5 - length(x |>
  t() |> cbind(dat, ..2 = _)

   ## And Iris's trick for changing a subset of attributes, i.e. the
"names", in a pipe
   dat |> names() |> _[4:8] <- paste0("s", 1:5)

## and here's the result:
> dat
  Year Sex  string s1   s2   s3   s4   s5
1 2002   F15 xc Ab 15   xc   Ab  
2 2003   F  14 14
3 2004   M  18 xb 25 35 21 18   xb   25   35   21
4 2005   M   13 25 13   25   
5 2006   M 14 ac 256 AV 35 14   ac  256   AV   35
6 2007   F  11 11

As I noted previously, all columns beyond Sex are character

Cheers,
Bert


On Fri, Jul 19, 2024 at 12:26 PM Val  wrote:
>
> Thank you Jeff and Bert for your help!
> The components of the string  could be nixed (i.e,  numeric, character
> or date). Once that is splitted it would be easy for me to format it
> accordingly.
>
> On Fri, Jul 19, 2024 at 2:10 PM Bert Gunter  wrote:
> >
> > I did not look closely at the solutions that you were offered, but
> > note that you did not specify in your post whether the numbers in your
> > string were to be character or numeric variables after they are broken
> > out into their own columns. I believe that they are character in the
> > solutions, but you should check this. If you want them as numeric,
> > e.g., for further processing, you will need to convert them. Or
> > vice-versa.
> >
> > Bert
> >
> >
> > On Fri, Jul 19, 2024 at 9:52 AM Val  wrote:
> > >
> > > Hi All,
> > >
> > > I want to extract new variables from a string and add it to the dataframe.
> > > Sample data is csv file.
> > >
> > > dat<-read.csv(text="Year, Sex,string
> > > 2002,F,15 xc Ab
> > > 2003,F,14
> > > 2004,M,18 xb 25 35 21
> > > 2005,M,13 25
> > > 2006,M,14 ac 256 AV 35
> > > 2007,F,11",header=TRUE)
> > >
> > > The string column has  a maximum of five variables. Some rows have all
> > > and others may not have all the five variables. If missing then  fill
> > > it with NA,
> > > Desired result is shown below,
> > >
> > >
> > > Year,Sex,string, S1, S2, S3 S4,S5
> > > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > > 2003,F,14, 14,NA,NA,NA,NA
> > > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > > 2005,M,13 25,13, 25,NA,NA,NA
> > > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > > 2007,F,11, 11,NA,NA,NA,NA
> > >
> > > Any help?
> > > Thank you in advance.
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide 
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Bert Gunter
I second Rich's excellent suggestion.

As with all elegant solutions, Iris's clicked on the wee light bulb in
my brain, and I realized that a slightly more verbose, but perhaps
more enlightening, alternative may be:

z |>  attr("names") |> _[2] <- "foo"

However, I would add this as an example *only with* Iris's solution.
Hers should be shown whether or not the above is.

Cheers,
Bert

On Sat, Jul 20, 2024 at 3:35 PM Richard M. Heiberger  wrote:
>
> I think Iris's solution should be added to the help file: ?|>
> there are no examples there now that show assignment or replacement using the 
> "_"
>
> > On Jul 20, 2024, at 18:21, Duncan Murdoch  wrote:
> >
> > On 2024-07-20 6:02 p.m., Iris Simmons wrote:
> >> z <- data.frame(a = 1:3, b = letters[1:3])
> >> z |> names() |> _[2] <- "foo"
> >> z
> >
> > That's a great suggestion!
> >
> > Duncan Murdoch
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [External] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Richard M. Heiberger
I think Iris's solution should be added to the help file: ?|>
there are no examples there now that show assignment or replacement using the 
"_"

> On Jul 20, 2024, at 18:21, Duncan Murdoch  wrote:
>
> On 2024-07-20 6:02 p.m., Iris Simmons wrote:
>> z <- data.frame(a = 1:3, b = letters[1:3])
>> z |> names() |> _[2] <- "foo"
>> z
>
> That's a great suggestion!
>
> Duncan Murdoch
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Duncan Murdoch

On 2024-07-20 6:02 p.m., Iris Simmons wrote:

z <- data.frame(a = 1:3, b = letters[1:3])
z |> names() |> _[2] <- "foo"
z


That's a great suggestion!

Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Bert Gunter
Iris's reply is what I was looking for.  Many thanks -- I can now sleep tonight!

Both Rui's and Duncan's responses merely hid what I wanted to avoid. I
hope that I did not occupy much of your times on my useless question
and rather pathetic attempts at an answer.

Cheers,
Bert



On Sat, Jul 20, 2024 at 3:02 PM Iris Simmons  wrote:
>
> It should be written more like this:
>
> ```R
> z <- data.frame(a = 1:3, b = letters[1:3])
> z |> names() |> _[2] <- "foo"
> z
> ```
>
> Regards,
> Iris
>
> On Sat, Jul 20, 2024 at 4:47 PM Bert Gunter  wrote:
> >
> > With further fooling around, I realized that explicitly assigning my
> > last "solution" 'works'; i.e.
> >
> > names(z)[2] <- "foo"
> >
> > can be piped as:
> >
> >  z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > This is even awfuller than before. So my query still stands.
> >
> > -- Bert
> >
> > On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:
> > >
> > > Nope, I still got it wrong: None of my approaches work.  :(
> > >
> > > So my query remains: how to do it via piping with |> ?
> > >
> > > Bert
> > >
> > >
> > > On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  
> > > wrote:
> > > >
> > > > This post is likely pretty useless;  it is motivated by a recent post
> > > > from "Val" that was elegantly answered using Tidyverse constructs, but
> > > > I wondered how to do it using base R only. Along the way, I ran into
> > > > the following question to which I think my answer (below) is pretty
> > > > awful. I would be interested in more elegant base R approaches. So...
> > > >
> > > > z <- data.frame(a = 1:3, b = letters[1:3])
> > > > > z
> > > >   a h
> > > > 1 1 a
> > > > 2 2 b
> > > > 3 3 c
> > > >
> > > > Suppose I want to change the name of the second column of z from 'b'
> > > > to 'foo' . This is very easy using nested function syntax by:
> > > >
> > > > names(z)[2] <- "foo"
> > > > > z
> > > >   a foo
> > > > 1 1   a
> > > > 2 2   b
> > > > 3 3   c
> > > >
> > > > Now suppose I wanted to do this using |> syntax, along the lines of:
> > > >
> > > > z |> names()[2] <- "foo"  ## throws an error
> > > >
> > > > Slightly fancier is:
> > > >
> > > > z |> (\(x)names(x)[2] <- "b")()
> > > > ## does nothing, but does not throw an error.
> > > >
> > > > However, the following, which resulted from a more careful read of
> > > > ?names works (after changing the name of the second column back to "b"
> > > > of course):
> > > >
> > > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > > >z
> > > >   a foo
> > > > 1 1   a
> > > > 2 2   b
> > > > 3 3   c
> > > >
> > > > This qualifies to me as "pretty awful." I'm sure there are better ways
> > > > to do this using pipe syntax, so I would appreciate any better
> > > > approaches.
> > > >
> > > > Best,
> > > > Bert
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Iris Simmons
It should be written more like this:

```R
z <- data.frame(a = 1:3, b = letters[1:3])
z |> names() |> _[2] <- "foo"
z
```

Regards,
Iris

On Sat, Jul 20, 2024 at 4:47 PM Bert Gunter  wrote:
>
> With further fooling around, I realized that explicitly assigning my
> last "solution" 'works'; i.e.
>
> names(z)[2] <- "foo"
>
> can be piped as:
>
>  z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > z
>   a foo
> 1 1   a
> 2 2   b
> 3 3   c
>
> This is even awfuller than before. So my query still stands.
>
> -- Bert
>
> On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:
> >
> > Nope, I still got it wrong: None of my approaches work.  :(
> >
> > So my query remains: how to do it via piping with |> ?
> >
> > Bert
> >
> >
> > On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:
> > >
> > > This post is likely pretty useless;  it is motivated by a recent post
> > > from "Val" that was elegantly answered using Tidyverse constructs, but
> > > I wondered how to do it using base R only. Along the way, I ran into
> > > the following question to which I think my answer (below) is pretty
> > > awful. I would be interested in more elegant base R approaches. So...
> > >
> > > z <- data.frame(a = 1:3, b = letters[1:3])
> > > > z
> > >   a h
> > > 1 1 a
> > > 2 2 b
> > > 3 3 c
> > >
> > > Suppose I want to change the name of the second column of z from 'b'
> > > to 'foo' . This is very easy using nested function syntax by:
> > >
> > > names(z)[2] <- "foo"
> > > > z
> > >   a foo
> > > 1 1   a
> > > 2 2   b
> > > 3 3   c
> > >
> > > Now suppose I wanted to do this using |> syntax, along the lines of:
> > >
> > > z |> names()[2] <- "foo"  ## throws an error
> > >
> > > Slightly fancier is:
> > >
> > > z |> (\(x)names(x)[2] <- "b")()
> > > ## does nothing, but does not throw an error.
> > >
> > > However, the following, which resulted from a more careful read of
> > > ?names works (after changing the name of the second column back to "b"
> > > of course):
> > >
> > > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > > >z
> > >   a foo
> > > 1 1   a
> > > 2 2   b
> > > 3 3   c
> > >
> > > This qualifies to me as "pretty awful." I'm sure there are better ways
> > > to do this using pipe syntax, so I would appreciate any better
> > > approaches.
> > >
> > > Best,
> > > Bert
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread avi.e.gross
Bert,

You need to consider LHS vs RHS functionality.

Before I start, I would have done your example setup lie this:

trio <- 1:3
z <- data.frame(a = trio, b = letters[trio])

Just kidding!

Syntactic sugar means you are calling this function:

> `names<-`
function (x, value)  .Primitive("names<-")

Not particularly documented is a gimmick I just tried of supplying a second 
argument to the more routine function version:

> `names<-`(z, c("one","two"))
  one two
1   1   a
2   2   b
3   3   c

The above does not change z, but returns a new DF with new names.

In a pipeline, try this:

z <-
  z |>
  `names<-`( c("one","two"))

> z
  a b
1 1 a
2 2 b
3 3 c
> z <-
+   z |>
+   `names<-`( c("one","two"))
> z
  one two
1   1   a
2   2   b
3   3   c





-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Saturday, July 20, 2024 4:47 PM
To: R-help 
Subject: Re: [R] Using the pipe, |>, syntax with "names<-"

With further fooling around, I realized that explicitly assigning my
last "solution" 'works'; i.e.

names(z)[2] <- "foo"

can be piped as:

 z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> z
  a foo
1 1   a
2 2   b
3 3   c

This is even awfuller than before. So my query still stands.

-- Bert

On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:
>
> Nope, I still got it wrong: None of my approaches work.  :(
>
> So my query remains: how to do it via piping with |> ?
>
> Bert
>
>
> On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:
> >
> > This post is likely pretty useless;  it is motivated by a recent post
> > from "Val" that was elegantly answered using Tidyverse constructs, but
> > I wondered how to do it using base R only. Along the way, I ran into
> > the following question to which I think my answer (below) is pretty
> > awful. I would be interested in more elegant base R approaches. So...
> >
> > z <- data.frame(a = 1:3, b = letters[1:3])
> > > z
> >   a h
> > 1 1 a
> > 2 2 b
> > 3 3 c
> >
> > Suppose I want to change the name of the second column of z from 'b'
> > to 'foo' . This is very easy using nested function syntax by:
> >
> > names(z)[2] <- "foo"
> > > z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > Now suppose I wanted to do this using |> syntax, along the lines of:
> >
> > z |> names()[2] <- "foo"  ## throws an error
> >
> > Slightly fancier is:
> >
> > z |> (\(x)names(x)[2] <- "b")()
> > ## does nothing, but does not throw an error.
> >
> > However, the following, which resulted from a more careful read of
> > ?names works (after changing the name of the second column back to "b"
> > of course):
> >
> > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > >z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > This qualifies to me as "pretty awful." I'm sure there are better ways
> > to do this using pipe syntax, so I would appreciate any better
> > approaches.
> >
> > Best,
> > Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Duncan Murdoch
I suspect that you would want to define a function which was aware of 
the limitations of piping to handle this.  For example:


rename <- function(x, col, newname) {
  names(x)[col] <- newname
  x
}

Then

z |> rename(2, "foo")

would be fine.

Duncan Murdoch

On 2024-07-20 4:46 p.m., Bert Gunter wrote:

With further fooling around, I realized that explicitly assigning my
last "solution" 'works'; i.e.

names(z)[2] <- "foo"

can be piped as:

  z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()

z

   a foo
1 1   a
2 2   b
3 3   c

This is even awfuller than before. So my query still stands.

-- Bert

On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:


Nope, I still got it wrong: None of my approaches work.  :(

So my query remains: how to do it via piping with |> ?

Bert


On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:


This post is likely pretty useless;  it is motivated by a recent post
from "Val" that was elegantly answered using Tidyverse constructs, but
I wondered how to do it using base R only. Along the way, I ran into
the following question to which I think my answer (below) is pretty
awful. I would be interested in more elegant base R approaches. So...

z <- data.frame(a = 1:3, b = letters[1:3])

z

   a h
1 1 a
2 2 b
3 3 c

Suppose I want to change the name of the second column of z from 'b'
to 'foo' . This is very easy using nested function syntax by:

names(z)[2] <- "foo"

z

   a foo
1 1   a
2 2   b
3 3   c

Now suppose I wanted to do this using |> syntax, along the lines of:

z |> names()[2] <- "foo"  ## throws an error

Slightly fancier is:

z |> (\(x)names(x)[2] <- "b")()
## does nothing, but does not throw an error.

However, the following, which resulted from a more careful read of
?names works (after changing the name of the second column back to "b"
of course):

z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()

z

   a foo
1 1   a
2 2   b
3 3   c

This qualifies to me as "pretty awful." I'm sure there are better ways
to do this using pipe syntax, so I would appreciate any better
approaches.

Best,
Bert


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Rui Barradas

Às 21:46 de 20/07/2024, Bert Gunter escreveu:

With further fooling around, I realized that explicitly assigning my
last "solution" 'works'; i.e.

names(z)[2] <- "foo"

can be piped as:

  z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()

z

   a foo
1 1   a
2 2   b
3 3   c

This is even awfuller than before. So my query still stands.

-- Bert

On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:


Nope, I still got it wrong: None of my approaches work.  :(

So my query remains: how to do it via piping with |> ?

Bert


On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:


This post is likely pretty useless;  it is motivated by a recent post
from "Val" that was elegantly answered using Tidyverse constructs, but
I wondered how to do it using base R only. Along the way, I ran into
the following question to which I think my answer (below) is pretty
awful. I would be interested in more elegant base R approaches. So...

z <- data.frame(a = 1:3, b = letters[1:3])

z

   a h
1 1 a
2 2 b
3 3 c

Suppose I want to change the name of the second column of z from 'b'
to 'foo' . This is very easy using nested function syntax by:

names(z)[2] <- "foo"

z

   a foo
1 1   a
2 2   b
3 3   c

Now suppose I wanted to do this using |> syntax, along the lines of:

z |> names()[2] <- "foo"  ## throws an error

Slightly fancier is:

z |> (\(x)names(x)[2] <- "b")()
## does nothing, but does not throw an error.

However, the following, which resulted from a more careful read of
?names works (after changing the name of the second column back to "b"
of course):

z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()

z

   a foo
1 1   a
2 2   b
3 3   c

This qualifies to me as "pretty awful." I'm sure there are better ways
to do this using pipe syntax, so I would appreciate any better
approaches.

Best,
Bert


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

This is not exactly the same but in one of your attempts all you have to 
do is to return x.

The following works and does something.


z |> (\(x){names(x)[2] <- "foo";x})()
#   a foo
# 1 1   a
# 2 2   b
# 3 3   c


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Bert Gunter
With further fooling around, I realized that explicitly assigning my
last "solution" 'works'; i.e.

names(z)[2] <- "foo"

can be piped as:

 z <- z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> z
  a foo
1 1   a
2 2   b
3 3   c

This is even awfuller than before. So my query still stands.

-- Bert

On Sat, Jul 20, 2024 at 1:14 PM Bert Gunter  wrote:
>
> Nope, I still got it wrong: None of my approaches work.  :(
>
> So my query remains: how to do it via piping with |> ?
>
> Bert
>
>
> On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:
> >
> > This post is likely pretty useless;  it is motivated by a recent post
> > from "Val" that was elegantly answered using Tidyverse constructs, but
> > I wondered how to do it using base R only. Along the way, I ran into
> > the following question to which I think my answer (below) is pretty
> > awful. I would be interested in more elegant base R approaches. So...
> >
> > z <- data.frame(a = 1:3, b = letters[1:3])
> > > z
> >   a h
> > 1 1 a
> > 2 2 b
> > 3 3 c
> >
> > Suppose I want to change the name of the second column of z from 'b'
> > to 'foo' . This is very easy using nested function syntax by:
> >
> > names(z)[2] <- "foo"
> > > z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > Now suppose I wanted to do this using |> syntax, along the lines of:
> >
> > z |> names()[2] <- "foo"  ## throws an error
> >
> > Slightly fancier is:
> >
> > z |> (\(x)names(x)[2] <- "b")()
> > ## does nothing, but does not throw an error.
> >
> > However, the following, which resulted from a more careful read of
> > ?names works (after changing the name of the second column back to "b"
> > of course):
> >
> > z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> > >z
> >   a foo
> > 1 1   a
> > 2 2   b
> > 3 3   c
> >
> > This qualifies to me as "pretty awful." I'm sure there are better ways
> > to do this using pipe syntax, so I would appreciate any better
> > approaches.
> >
> > Best,
> > Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Bert Gunter
Nope, I still got it wrong: None of my approaches work.  :(

So my query remains: how to do it via piping with |> ?

Bert


On Sat, Jul 20, 2024 at 1:06 PM Bert Gunter  wrote:
>
> This post is likely pretty useless;  it is motivated by a recent post
> from "Val" that was elegantly answered using Tidyverse constructs, but
> I wondered how to do it using base R only. Along the way, I ran into
> the following question to which I think my answer (below) is pretty
> awful. I would be interested in more elegant base R approaches. So...
>
> z <- data.frame(a = 1:3, b = letters[1:3])
> > z
>   a h
> 1 1 a
> 2 2 b
> 3 3 c
>
> Suppose I want to change the name of the second column of z from 'b'
> to 'foo' . This is very easy using nested function syntax by:
>
> names(z)[2] <- "foo"
> > z
>   a foo
> 1 1   a
> 2 2   b
> 3 3   c
>
> Now suppose I wanted to do this using |> syntax, along the lines of:
>
> z |> names()[2] <- "foo"  ## throws an error
>
> Slightly fancier is:
>
> z |> (\(x)names(x)[2] <- "b")()
> ## does nothing, but does not throw an error.
>
> However, the following, which resulted from a more careful read of
> ?names works (after changing the name of the second column back to "b"
> of course):
>
> z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
> >z
>   a foo
> 1 1   a
> 2 2   b
> 3 3   c
>
> This qualifies to me as "pretty awful." I'm sure there are better ways
> to do this using pipe syntax, so I would appreciate any better
> approaches.
>
> Best,
> Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using the pipe, |>, syntax with "names<-"

2024-07-20 Thread Bert Gunter
This post is likely pretty useless;  it is motivated by a recent post
from "Val" that was elegantly answered using Tidyverse constructs, but
I wondered how to do it using base R only. Along the way, I ran into
the following question to which I think my answer (below) is pretty
awful. I would be interested in more elegant base R approaches. So...

z <- data.frame(a = 1:3, b = letters[1:3])
> z
  a h
1 1 a
2 2 b
3 3 c

Suppose I want to change the name of the second column of z from 'b'
to 'foo' . This is very easy using nested function syntax by:

names(z)[2] <- "foo"
> z
  a foo
1 1   a
2 2   b
3 3   c

Now suppose I wanted to do this using |> syntax, along the lines of:

z |> names()[2] <- "foo"  ## throws an error

Slightly fancier is:

z |> (\(x)names(x)[2] <- "b")()
## does nothing, but does not throw an error.

However, the following, which resulted from a more careful read of
?names works (after changing the name of the second column back to "b"
of course):

z |>(\(x) "names<-"(x,value = "[<-"(names(x),2,'foo')))()
>z
  a foo
1 1   a
2 2   b
3 3   c

This qualifies to me as "pretty awful." I'm sure there are better ways
to do this using pipe syntax, so I would appreciate any better
approaches.

Best,
Bert

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-19 Thread Val
Thank you Jeff and Bert for your help!
The components of the string  could be nixed (i.e,  numeric, character
or date). Once that is splitted it would be easy for me to format it
accordingly.

On Fri, Jul 19, 2024 at 2:10 PM Bert Gunter  wrote:
>
> I did not look closely at the solutions that you were offered, but
> note that you did not specify in your post whether the numbers in your
> string were to be character or numeric variables after they are broken
> out into their own columns. I believe that they are character in the
> solutions, but you should check this. If you want them as numeric,
> e.g., for further processing, you will need to convert them. Or
> vice-versa.
>
> Bert
>
>
> On Fri, Jul 19, 2024 at 9:52 AM Val  wrote:
> >
> > Hi All,
> >
> > I want to extract new variables from a string and add it to the dataframe.
> > Sample data is csv file.
> >
> > dat<-read.csv(text="Year, Sex,string
> > 2002,F,15 xc Ab
> > 2003,F,14
> > 2004,M,18 xb 25 35 21
> > 2005,M,13 25
> > 2006,M,14 ac 256 AV 35
> > 2007,F,11",header=TRUE)
> >
> > The string column has  a maximum of five variables. Some rows have all
> > and others may not have all the five variables. If missing then  fill
> > it with NA,
> > Desired result is shown below,
> >
> >
> > Year,Sex,string, S1, S2, S3 S4,S5
> > 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> > 2003,F,14, 14,NA,NA,NA,NA
> > 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> > 2005,M,13 25,13, 25,NA,NA,NA
> > 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> > 2007,F,11, 11,NA,NA,NA,NA
> >
> > Any help?
> > Thank you in advance.
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-19 Thread Bert Gunter
I did not look closely at the solutions that you were offered, but
note that you did not specify in your post whether the numbers in your
string were to be character or numeric variables after they are broken
out into their own columns. I believe that they are character in the
solutions, but you should check this. If you want them as numeric,
e.g., for further processing, you will need to convert them. Or
vice-versa.

Bert


On Fri, Jul 19, 2024 at 9:52 AM Val  wrote:
>
> Hi All,
>
> I want to extract new variables from a string and add it to the dataframe.
> Sample data is csv file.
>
> dat<-read.csv(text="Year, Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11",header=TRUE)
>
> The string column has  a maximum of five variables. Some rows have all
> and others may not have all the five variables. If missing then  fill
> it with NA,
> Desired result is shown below,
>
>
> Year,Sex,string, S1, S2, S3 S4,S5
> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> 2003,F,14, 14,NA,NA,NA,NA
> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> 2005,M,13 25,13, 25,NA,NA,NA
> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> 2007,F,11, 11,NA,NA,NA,NA
>
> Any help?
> Thank you in advance.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-19 Thread Jeff Newmiller via R-help
Here is another way... for data analysis, the idiomatic result is usually more 
useful, though for presentation in a final result the wide result might be 
desired.

library(dplyr)
library(tidyr)

dat<-read.csv(text=
"Year, Sex,string
2002,F,15 xc Ab
2003,F,14
2004,M,18 xb 25 35 21
2005,M,13 25
2006,M,14 ac 256 AV 35
2007,F,11"
, header=TRUE )

idiomatic <- (
dat
%>% mutate( string = strsplit( string, " " ) )
%>% unnest( cols = string )
%>% group_by( Year, Sex )
%>% mutate( s_name = paste0( "S", seq_along( string ) ) )
%>% ungroup()
)
idiomatic # each row has unique Year, Sex, and s_name

wide <- (
idiomatic
%>% spread( s_name, string )
)
wide


On July 19, 2024 11:23:48 AM PDT, Val  wrote:
>Thank you and sorry for the confusion.
>The desired result should have 8 variables as a comma separated in
>each line.  The string variable  is  considered as one variable.
>The output of your script is wfine for me.  Thank you!
>
>On Fri, Jul 19, 2024 at 1:00 PM Ebert,Timothy Aaron  wrote:
>>
>> The desired result is odd.
>> 1) It looks like the string is duplicated in the desired result. The first 
>> line of data has "15, xc, Ab",  and the desired result has "15, xc, Ab, 15, 
>> xc, Ab"
>> 2) The example has S1 through S5, but the desired result has data for eight 
>> variables in the first line (not five).
>> 3) The desired result has a different number of variables for each line.
>> 4) Are you assuming that all missing data is at the end of the string? If 
>> there are 5 variables (S1  S5), do you know that "15, xc, Ab" is S1 = 
>> 15, S2 = 'xc', and S3 = 'Ab' rather than S2=15, S4='xc' and S5='Ab' ?
>>
>> This isn't exactly what you asked for, but maybe I was confused somewhere. 
>> This approach puts string data into variables in order. In this approach one 
>> mixes string and numeric data. The string is not duplicated.
>>
>> library(tidyr)
>>
>> dat <- read.csv(text="Year,Sex,string
>> 2002,F,15 xc Ab
>> 2003,F,14
>> 2004,M,18 xb 25 35 21
>> 2005,M,13 25
>> 2006,M,14 ac 256 AV 35
>> 2007,F,11", header=TRUE, stringsAsFactors=FALSE)
>>
>> # split the 'string' column based on spaces
>> dat_separated <- dat |>
>>   separate(string, into = paste0("S", 1:5), sep = " ",
>>fill = "right", extra = "merge")
>>
>> Tim
>>
>>
>> -Original Message-
>> From: R-help  On Behalf Of Val
>> Sent: Friday, July 19, 2024 12:52 PM
>> To: r-help@R-project.org (r-help@r-project.org) 
>> Subject: [R] Extract
>>
>> [External Email]
>>
>> Hi All,
>>
>> I want to extract new variables from a string and add it to the dataframe.
>> Sample data is csv file.
>>
>> dat<-read.csv(text="Year, Sex,string
>> 2002,F,15 xc Ab
>> 2003,F,14
>> 2004,M,18 xb 25 35 21
>> 2005,M,13 25
>> 2006,M,14 ac 256 AV 35
>> 2007,F,11",header=TRUE)
>>
>> The string column has  a maximum of five variables. Some rows have all and 
>> others may not have all the five variables. If missing then  fill it with 
>> NA, Desired result is shown below,
>>
>>
>> Year,Sex,string, S1, S2, S3 S4,S5
>> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
>> 2003,F,14, 14,NA,NA,NA,NA
>> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
>> 2005,M,13 25,13, 25,NA,NA,NA
>> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
>> 2007,F,11, 11,NA,NA,NA,NA
>>
>> Any help?
>> Thank you in advance.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-19 Thread Val
Thank you and sorry for the confusion.
The desired result should have 8 variables as a comma separated in
each line.  The string variable  is  considered as one variable.
The output of your script is wfine for me.  Thank you!

On Fri, Jul 19, 2024 at 1:00 PM Ebert,Timothy Aaron  wrote:
>
> The desired result is odd.
> 1) It looks like the string is duplicated in the desired result. The first 
> line of data has "15, xc, Ab",  and the desired result has "15, xc, Ab, 15, 
> xc, Ab"
> 2) The example has S1 through S5, but the desired result has data for eight 
> variables in the first line (not five).
> 3) The desired result has a different number of variables for each line.
> 4) Are you assuming that all missing data is at the end of the string? If 
> there are 5 variables (S1  S5), do you know that "15, xc, Ab" is S1 = 15, 
> S2 = 'xc', and S3 = 'Ab' rather than S2=15, S4='xc' and S5='Ab' ?
>
> This isn't exactly what you asked for, but maybe I was confused somewhere. 
> This approach puts string data into variables in order. In this approach one 
> mixes string and numeric data. The string is not duplicated.
>
> library(tidyr)
>
> dat <- read.csv(text="Year,Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11", header=TRUE, stringsAsFactors=FALSE)
>
> # split the 'string' column based on spaces
> dat_separated <- dat |>
>   separate(string, into = paste0("S", 1:5), sep = " ",
>fill = "right", extra = "merge")
>
> Tim
>
>
> -Original Message-
> From: R-help  On Behalf Of Val
> Sent: Friday, July 19, 2024 12:52 PM
> To: r-help@R-project.org (r-help@r-project.org) 
> Subject: [R] Extract
>
> [External Email]
>
> Hi All,
>
> I want to extract new variables from a string and add it to the dataframe.
> Sample data is csv file.
>
> dat<-read.csv(text="Year, Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11",header=TRUE)
>
> The string column has  a maximum of five variables. Some rows have all and 
> others may not have all the five variables. If missing then  fill it with NA, 
> Desired result is shown below,
>
>
> Year,Sex,string, S1, S2, S3 S4,S5
> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> 2003,F,14, 14,NA,NA,NA,NA
> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> 2005,M,13 25,13, 25,NA,NA,NA
> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> 2007,F,11, 11,NA,NA,NA,NA
>
> Any help?
> Thank you in advance.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-19 Thread Ebert,Timothy Aaron
The desired result is odd.
1) It looks like the string is duplicated in the desired result. The first line 
of data has "15, xc, Ab",  and the desired result has "15, xc, Ab, 15, xc, Ab"
2) The example has S1 through S5, but the desired result has data for eight 
variables in the first line (not five).
3) The desired result has a different number of variables for each line.
4) Are you assuming that all missing data is at the end of the string? If there 
are 5 variables (S1  S5), do you know that "15, xc, Ab" is S1 = 15, S2 = 
'xc', and S3 = 'Ab' rather than S2=15, S4='xc' and S5='Ab' ?

This isn't exactly what you asked for, but maybe I was confused somewhere. This 
approach puts string data into variables in order. In this approach one mixes 
string and numeric data. The string is not duplicated.

library(tidyr)

dat <- read.csv(text="Year,Sex,string
2002,F,15 xc Ab
2003,F,14
2004,M,18 xb 25 35 21
2005,M,13 25
2006,M,14 ac 256 AV 35
2007,F,11", header=TRUE, stringsAsFactors=FALSE)

# split the 'string' column based on spaces
dat_separated <- dat |>
  separate(string, into = paste0("S", 1:5), sep = " ",
   fill = "right", extra = "merge")

Tim


-Original Message-
From: R-help  On Behalf Of Val
Sent: Friday, July 19, 2024 12:52 PM
To: r-help@R-project.org (r-help@r-project.org) 
Subject: [R] Extract

[External Email]

Hi All,

I want to extract new variables from a string and add it to the dataframe.
Sample data is csv file.

dat<-read.csv(text="Year, Sex,string
2002,F,15 xc Ab
2003,F,14
2004,M,18 xb 25 35 21
2005,M,13 25
2006,M,14 ac 256 AV 35
2007,F,11",header=TRUE)

The string column has  a maximum of five variables. Some rows have all and 
others may not have all the five variables. If missing then  fill it with NA, 
Desired result is shown below,


Year,Sex,string, S1, S2, S3 S4,S5
2002,F,15 xc Ab, 15,xc,Ab, NA, NA
2003,F,14, 14,NA,NA,NA,NA
2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
2005,M,13 25,13, 25,NA,NA,NA
2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
2007,F,11, 11,NA,NA,NA,NA

Any help?
Thank you in advance.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract

2024-07-19 Thread Robert Knight
I would split dat$string into it's own vector, break it apart at the spaces
into an array, and then place dat$year and dat$sex in positions 1 and 2 of
that newly created array.





On Fri, Jul 19, 2024, 12:52 PM Val  wrote:

> Hi All,
>
> I want to extract new variables from a string and add it to the dataframe.
> Sample data is csv file.
>
> dat<-read.csv(text="Year, Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11",header=TRUE)
>
> The string column has  a maximum of five variables. Some rows have all
> and others may not have all the five variables. If missing then  fill
> it with NA,
> Desired result is shown below,
>
>
> Year,Sex,string, S1, S2, S3 S4,S5
> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> 2003,F,14, 14,NA,NA,NA,NA
> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> 2005,M,13 25,13, 25,NA,NA,NA
> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> 2007,F,11, 11,NA,NA,NA,NA
>
> Any help?
> Thank you in advance.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extract

2024-07-19 Thread Val
Hi All,

I want to extract new variables from a string and add it to the dataframe.
Sample data is csv file.

dat<-read.csv(text="Year, Sex,string
2002,F,15 xc Ab
2003,F,14
2004,M,18 xb 25 35 21
2005,M,13 25
2006,M,14 ac 256 AV 35
2007,F,11",header=TRUE)

The string column has  a maximum of five variables. Some rows have all
and others may not have all the five variables. If missing then  fill
it with NA,
Desired result is shown below,


Year,Sex,string, S1, S2, S3 S4,S5
2002,F,15 xc Ab, 15,xc,Ab, NA, NA
2003,F,14, 14,NA,NA,NA,NA
2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
2005,M,13 25,13, 25,NA,NA,NA
2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
2007,F,11, 11,NA,NA,NA,NA

Any help?
Thank you in advance.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread SIBYLLE STÖCKLI via R-help
Thanks a lot Rui and Jeff

Yes including labels=c() in  scale_linetype_manual() was the hint.

Sibylle

-Original Message-
From: Rui Barradas  
Sent: Thursday, July 18, 2024 6:50 PM
To: sibylle.stoec...@gmx.ch; r-help@r-project.org
Subject: Re: [R] ggplot two-factor legend

Às 17:43 de 18/07/2024, Rui Barradas escreveu:
> Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu:
>> Hi
>>
>> I am using ggplot to visualise y for a two-factorial group (Bio: 0 
>> and
>> 1) x
>> = 6 years. I was able to adapt the colour of the lines (green and 
>> red) and the linetype (solid and dashed).
>> Challenge: my code produces now two legends. One with the colors for 
>> the group and one with the linetype for the group. Does somebody have 
>> a hint how to adapt the code to produce one legend? Group 0 = red and 
>> dashed, Group 1 = green and solid?
>>
>>
>> MS1<- MS %>% filter(QI_A!="NA") %>% droplevels() dev.new(width=4, 
>> height=2.75) par(mar = c(0,6,0,0)) p1<-ggplot(data = MS1, aes(x= 
>> Jahr, y= QI_A,group=Bio,color=Bio,
>> linetype=Bio)) +
>>  geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x 
>> +
>> I(x^2),linewidth=1) +
>> theme(panel.background = element_blank())+
>> theme(axis.line = element_line(colour = "black"))+
>>theme(axis.text=element_text(size=18))+
>>theme(axis.title=element_text(size=20))+
>> ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
>> scale_color_manual(values=c("red","dark green"), labels=c("ÖLN", 
>> "BIO"))+
>> scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN", 
>> "BIO"))+
>> theme(legend.title = element_blank())+
>>theme(legend.text=element_text(size=20))+
>>scale_linetype_manual(values=c("dashed", "solid"))
>> p1<-p1 + expand_limits(y=c(0, 30))
>>
>> kind regards
>> Sibylle
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> To have one legend only, the labels must be the same. Try using
> 
> labels=c("ÖLN", "BIO")
> 
> in
> 
> scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", 
> "BIO"))
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 
> 
Hello,

Here is a more complete an answer with the built-in data set mtcars.
Note that the group aesthetic is not used. This is because linetype is 
categorical (after mutate) and there's no need to group again by the same 
variable (am).

Remove labels from scale_linetype_manual and there are two legends but with the 
same labels the legends merge.


library(ggplot2)
library(dplyr)

mtcars %>%
   # linetype must be categorical
   mutate(am = factor(am)) %>%
   ggplot(aes(hp, disp, color = am, linetype = am)) +
   geom_line() +
   scale_color_manual(
 values = c("red","dark green"),
 labels = c("ÖLN", "BIO")
   ) +
   scale_linetype_manual(
 values = c("dashed", "solid"),
 labels = c("ÖLN", "BIO")
   ) +
   theme_bw()


Hope this helps,

Rui Barradas



-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread Rui Barradas

Às 17:43 de 18/07/2024, Rui Barradas escreveu:

Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu:

Hi

I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 
1) x
= 6 years. I was able to adapt the colour of the lines (green and red) 
and

the linetype (solid and dashed).
Challenge: my code produces now two legends. One with the colors for the
group and one with the linetype for the group. Does somebody have a 
hint how
to adapt the code to produce one legend? Group 0 = red and dashed, 
Group 1 =

green and solid?


MS1<- MS %>% filter(QI_A!="NA") %>% droplevels()
dev.new(width=4, height=2.75)
par(mar = c(0,6,0,0))
p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio,
linetype=Bio)) +
 geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
I(x^2),linewidth=1) +
theme(panel.background = element_blank())+
theme(axis.line = element_line(colour = "black"))+
   theme(axis.text=element_text(size=18))+
   theme(axis.title=element_text(size=20))+
ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
scale_color_manual(values=c("red","dark green"), labels=c("ÖLN",
"BIO"))+
scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN",
"BIO"))+
theme(legend.title = element_blank())+
   theme(legend.text=element_text(size=20))+
   scale_linetype_manual(values=c("dashed", "solid"))
p1<-p1 + expand_limits(y=c(0, 30))

kind regards
Sibylle

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

To have one legend only, the labels must be the same. Try using

labels=c("ÖLN", "BIO")

in

scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", "BIO"))


Hope this helps,

Rui Barradas



Hello,

Here is a more complete an answer with the built-in data set mtcars.
Note that the group aesthetic is not used. This is because linetype is 
categorical (after mutate) and there's no need to group again by the 
same variable (am).


Remove labels from scale_linetype_manual and there are two legends but 
with the same labels the legends merge.



library(ggplot2)
library(dplyr)

mtcars %>%
  # linetype must be categorical
  mutate(am = factor(am)) %>%
  ggplot(aes(hp, disp, color = am, linetype = am)) +
  geom_line() +
  scale_color_manual(
values = c("red","dark green"),
labels = c("ÖLN", "BIO")
  ) +
  scale_linetype_manual(
values = c("dashed", "solid"),
labels = c("ÖLN", "BIO")
  ) +
  theme_bw()


Hope this helps,

Rui Barradas



--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread Rui Barradas

Às 16:27 de 18/07/2024, SIBYLLE STÖCKLI via R-help escreveu:

Hi

I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 1) x
= 6 years. I was able to adapt the colour of the lines (green and red) and
the linetype (solid and dashed).
Challenge: my code produces now two legends. One with the colors for the
group and one with the linetype for the group. Does somebody have a hint how
to adapt the code to produce one legend? Group 0 = red and dashed, Group 1 =
green and solid?


MS1<- MS %>% filter(QI_A!="NA") %>% droplevels()
dev.new(width=4, height=2.75)
par(mar = c(0,6,0,0))
p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio,
linetype=Bio)) +
geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
I(x^2),linewidth=1) +
theme(panel.background = element_blank())+
theme(axis.line = element_line(colour = "black"))+
   theme(axis.text=element_text(size=18))+
   theme(axis.title=element_text(size=20))+
ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
scale_color_manual(values=c("red","dark green"), labels=c("ÖLN",
"BIO"))+
scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN",
"BIO"))+
theme(legend.title = element_blank())+
   theme(legend.text=element_text(size=20))+
   scale_linetype_manual(values=c("dashed", "solid"))
p1<-p1 + expand_limits(y=c(0, 30))

kind regards
Sibylle

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hello,

To have one legend only, the labels must be the same. Try using

labels=c("ÖLN", "BIO")

in

scale_linetype_manual(values=c("dashed", "solid"), labels=c("ÖLN", "BIO"))


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread SIBYLLE STÖCKLI via R-help
Thanks Jeff

I removed the group parameter in the fp1<-ggplot () line. It doesn't change 
anything.
I suppose I got two legends as in the ggplot () line I have color=Bio & 
linetype=Bio. However, when removing linetype = Bio I just geht red and green. 
For black and white printing I would like the additionally differentiate the 
two lines (groups) in the linetype.

Sibylle

-Original Message-
From: Jeff Newmiller  
Sent: Thursday, July 18, 2024 6:13 PM
To: sibylle.stoec...@gmx.ch; SIBYLLE STÖCKLI via R-help ; 
r-help@r-project.org
Subject: Re: [R] ggplot two-factor legend

If I follow your question, you want redundant aesthetics. Ggplot normally 
notices correlated aesthetic mapping variables and merges the legends, so the 
most likely answer is that your data are not fully correlated in all rows. I 
have also seen this where data are drawn from different dataframes for 
different layers since it is hard to merge factors, but I don't see that here.

You are using the group parameter... try removing that? The group parameter 
overrides the automatic group determination. There might be a syntax for 
specifying correlated grouping, but I don't know it... I normally just verify 
that my data meets the requirements to be automatically identified as 
correlated if that is my goal, since that is a prerequisite anyway.

On July 18, 2024 8:27:05 AM PDT, "SIBYLLE STÖCKLI via R-help" 
 wrote:
>Hi
>
>I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 
>1) x = 6 years. I was able to adapt the colour of the lines (green and 
>red) and the linetype (solid and dashed).
>Challenge: my code produces now two legends. One with the colors for 
>the group and one with the linetype for the group. Does somebody have a 
>hint how to adapt the code to produce one legend? Group 0 = red and 
>dashed, Group 1 = green and solid?
>
>
>MS1<- MS %>% filter(QI_A!="NA") %>% droplevels() dev.new(width=4, 
>height=2.75) par(mar = c(0,6,0,0)) p1<-ggplot(data = MS1, aes(x= Jahr, 
>y= QI_A,group=Bio,color=Bio,
>linetype=Bio)) + 
>   geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
>I(x^2),linewidth=1) +
>   theme(panel.background = element_blank())+
>   theme(axis.line = element_line(colour = "black"))+
>  theme(axis.text=element_text(size=18))+
>  theme(axis.title=element_text(size=20))+
>   ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
>   scale_color_manual(values=c("red","dark green"), labels=c("ÖLN", 
>"BIO"))+
>   scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN", 
>"BIO"))+
>   theme(legend.title = element_blank())+
>  theme(legend.text=element_text(size=20))+
>  scale_linetype_manual(values=c("dashed", "solid"))
>p1<-p1 + expand_limits(y=c(0, 30))
>
>kind regards
>Sibylle
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide 
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

--
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot two-factor legend

2024-07-18 Thread Jeff Newmiller via R-help
If I follow your question, you want redundant aesthetics. Ggplot normally 
notices correlated aesthetic mapping variables and merges the legends, so the 
most likely answer is that your data are not fully correlated in all rows. I 
have also seen this where data are drawn from different dataframes for 
different layers since it is hard to merge factors, but I don't see that here.

You are using the group parameter... try removing that? The group parameter 
overrides the automatic group determination. There might be a syntax for 
specifying correlated grouping, but I don't know it... I normally just verify 
that my data meets the requirements to be automatically identified as 
correlated if that is my goal, since that is a prerequisite anyway.

On July 18, 2024 8:27:05 AM PDT, "SIBYLLE STÖCKLI via R-help" 
 wrote:
>Hi
>
>I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 1) x
>= 6 years. I was able to adapt the colour of the lines (green and red) and
>the linetype (solid and dashed).
>Challenge: my code produces now two legends. One with the colors for the
>group and one with the linetype for the group. Does somebody have a hint how
>to adapt the code to produce one legend? Group 0 = red and dashed, Group 1 =
>green and solid?
>
>
>MS1<- MS %>% filter(QI_A!="NA") %>% droplevels()
>dev.new(width=4, height=2.75)
>par(mar = c(0,6,0,0))
>p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio,
>linetype=Bio)) + 
>   geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
>I(x^2),linewidth=1) +
>   theme(panel.background = element_blank())+
>   theme(axis.line = element_line(colour = "black"))+
>  theme(axis.text=element_text(size=18))+
>  theme(axis.title=element_text(size=20))+
>   ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
>   scale_color_manual(values=c("red","dark green"), labels=c("ÖLN",
>"BIO"))+
>   scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN",
>"BIO"))+
>   theme(legend.title = element_blank())+
>  theme(legend.text=element_text(size=20))+
>  scale_linetype_manual(values=c("dashed", "solid"))
>p1<-p1 + expand_limits(y=c(0, 30))
>
>kind regards
>Sibylle
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ggplot two-factor legend

2024-07-18 Thread SIBYLLE STÖCKLI via R-help
Hi

I am using ggplot to visualise y for a two-factorial group (Bio: 0 and 1) x
= 6 years. I was able to adapt the colour of the lines (green and red) and
the linetype (solid and dashed).
Challenge: my code produces now two legends. One with the colors for the
group and one with the linetype for the group. Does somebody have a hint how
to adapt the code to produce one legend? Group 0 = red and dashed, Group 1 =
green and solid?


MS1<- MS %>% filter(QI_A!="NA") %>% droplevels()
dev.new(width=4, height=2.75)
par(mar = c(0,6,0,0))
p1<-ggplot(data = MS1, aes(x= Jahr, y= QI_A,group=Bio,color=Bio,
linetype=Bio)) + 
geom_smooth(aes(fill=Bio) , method = "lm" , formula = y ~ x +
I(x^2),linewidth=1) +
theme(panel.background = element_blank())+
theme(axis.line = element_line(colour = "black"))+
  theme(axis.text=element_text(size=18))+
  theme(axis.title=element_text(size=20))+
ylab("Anteil BFF an LN [%]") +xlab("Jahr")+
scale_color_manual(values=c("red","dark green"), labels=c("ÖLN",
"BIO"))+
scale_fill_manual(values=c("red","dark green"), labels= c("ÖLN",
"BIO"))+
theme(legend.title = element_blank())+
  theme(legend.text=element_text(size=20))+
  scale_linetype_manual(values=c("dashed", "solid"))
p1<-p1 + expand_limits(y=c(0, 30))

kind regards
Sibylle

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grDevices segfault when building R4.4.0 on RHEL 9.1.

2024-07-17 Thread Ivan Krylov via R-help
В Wed, 17 Jul 2024 02:35:22 +
Miguel Esteva  пишет:

> I replaced the "--with-lapack" flag with "--with-lapack='-lflexiblas
> -L/tools/flexiblas/3.4.2/lib64'" and everything built ok.

Glad to see you managed to avoid the crash!

> From a quick check in my emails, seems the RHEL9 system lapack
> packages are broken. Will test a bit more.

Simon Andrews has also shown me how to reproduce the crash on
AlmaLinux: https://stat.ethz.ch/pipermail/r-help/2024-May/479321.html

Looks like an ABI incompatibility between gfortran-11 and blas-devel +
lapack-devel.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] grDevices segfault when building R4.4.0 on RHEL 9.1.

2024-07-17 Thread Miguel Esteva via R-help
Hi Ivan,

An apology, I was away for quite a bit.

To reproduce the setup:

I have been using the default GCC in RHEL 9.1.

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-host-pie 
--enable-host-bind-now --enable-languages=c,c++,fortran,lto --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
--enable-threads=posix --enable-checking=release --with-system-zlib 
--enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object 
--enable-linker-build-id --with-gcc-major-version-only --enable-plugin 
--enable-initfini-array --without-isl --enable-multilib 
--with-linker-hash-style=gnu --enable-offload-targets=nvptx-none 
--without-cuda-driver --enable-gnu-indirect-function --enable-cet 
--with-tune=generic --with-arch_64=x86-64-v2 --with-arch_32=x86-64 
--build=x86_64-redhat-linux --with-build-config=bootstrap-lto 
--enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.1 20230605 (Red Hat 11.4.1-2) (GCC)

I have been building R 4.4.0 and 4.4.1 with Flexiblas and with the built in R 
BLAS/LAPACK.

R BLAS:
./configure --prefix=/tools/R/$RVER  --enable-R-shlib --enable-memory-profiling 
 --with-pcre2=/tools/pcre2/10.42

Flexiblas:

PKG_CONFIG_PATH=/tools/flexiblas/3.4.2/lib64/pkgconfig ./configure 
--prefix=/tools/R/flexiblas/4.4.1  --enable-R-shlib --enable-memory-profiling 
--with-pcre2=/tools/pcre2/10.42 --with-blas="-lflexiblas 
-L/tools/flexiblas/3.4.2/lib64" --with-lapack

I realised the build fails when "--with-lapack" is left unspecified, even 
though the configure output shows this:

  Source directory:.
  Installation directory:  /tools/R/flexiblas

  C compiler:  gcc  -g -O2
  Fortran fixed-form compiler: gfortran  -g -O2

  Default C++ compiler:g++ -std=gnu++17  -g -O2
  C++11 compiler:  g++ -std=gnu++11  -g -O2
  C++14 compiler:  g++ -std=gnu++14  -g -O2
  C++17 compiler:  g++ -std=gnu++17  -g -O2
  C++20 compiler:  g++ -std=gnu++20  -g -O2
  C++23 compiler:  g++ -std=gnu++23  -g -O2
  Fortran free-form compiler:  gfortran  -g -O2
  Obj-C compiler:

  Interfaces supported:X11, tcltk
  External libraries:  pcre2, readline, BLAS(FlexiBlas), LAPACK(in 
blas), curl, libdeflate
  Additional capabilities: PNG, JPEG, TIFF, NLS, cairo, ICU
  Options enabled: shared R library, R profiling, memory profiling, 
libdeflate for lazyload

  Capabilities skipped:
  Options not enabled: shared BLAS

  Recommended packages:yes

I replaced the "--with-lapack" flag with "--with-lapack='-lflexiblas 
-L/tools/flexiblas/3.4.2/lib64'" and everything built ok.

>From a quick check in my emails, seems the RHEL9 system lapack packages are 
>broken. Will test a bit more.

If you need a singularity container in the future, I can provide one with the 
R-dependencies installed. We setup dependencies similar to: 
https://github.com/rstudio/r-builds/blob/main/builder/Dockerfile.rhel-9

or

subscription-manager repos --enable codeready-builder-for-rhel-9-$(arch)-rpms
dnf install -y yum-utils
dnf install -y 
https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
yum-builddep -y R

Just bringing this to your attention as the issue is not with R by the looks, 
as I was unable to reproduce on Rocky Linux 9.1.

Kind regards and thanks!


Miguel Esteva
Senior ITS Research Systems Engineer


The Walter and Eliza Hall Institute of Medical Research
1G Royal Parade
Parkville VIC 3052
Australia

Phone (03) 9345 2909

Email estev...@wehi.edu.au

Web http://www.wehi.edu.au


From: Ivan Krylov 
Sent: Friday, 3 May 2024 9:40 PM
To: Miguel Esteva via R-help 
Cc: Miguel Esteva 
Subject: Re: [R] grDevices segfault when building R4.4.0 on RHEL 9.1.

Dear Miguel Esteva,

I couldn't get a Red Hat "ubi9" container to install enough
dependencies to build R. Is there a way to reproduce your setup on a
virtual machine somewhere?

On Fri, 3 May 2024 00:42:43 +
Miguel Esteva via R-help  wrote:

>  *** caught segfault ***
>
> address 0x1801fa8f70, cause 'memory not mapped'
>
>
> Traceback:
>
>  1: solve.default(rgb)

This seems to crash inside the BLAS. Which BLAS are you using? Any
custom ./configure arguments? Which compilers are you running?

To find out more information about the crash, try to follow it with a
debugger. Change directory to src/library/grDevices and run:

_R_COMPILE_PKGS_=1 R_COMPILER_SUPPRESS_ALL=1 \
 R_DEFAULT_PACKAGES=NULL LC_ALL=C \
../../../bin/R -d gdb --vanilla --no-echo -e \
 'tools:::makeLazyLoading("grDevices")'

(This assumes building straight from 

Re: [R] Interpreting p values of gls in nlme

2024-07-16 Thread Ebert,Timothy Aaron
In a lm() model a significant intercept means that the line passes above or 
below the intercept (x=0, y=0). A significant predictor means that the slope is 
not zero. More  generally the significant predictor means that the predictor 
has some influence on the predicted. With nlme() the relationship may not be 
linear. Your result indicates that you cannot tell if the relationship passes 
through the origin or not, but the predictor has a significant influence on the 
predicted.

Tim

-Original Message-
From: R-help  On Behalf Of Roland Sookias
Sent: Tuesday, July 16, 2024 12:08 PM
To: r-help@r-project.org
Subject: [R] Interpreting p values of gls in nlme

[External Email]

Dear all

I have undertaken some phylogenetic and non-phylogenetic regressions with
gls() in nlme with single preictor variables. A p value is associated with the 
intercept (upper p value) and another with the predictor variable (lower). 
Which p value is important? What does it mean if the intercept p value is 
insignificant but the predictor is still significant?

Thanks a lot, and sorry for my ignorance,

Roland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Interpreting p values of gls in nlme

2024-07-16 Thread Bert Gunter
Yikes!

This list is for help on R *programming*, not statistics per se,
although these do sometimes intersect. However, your query strikes me
as a request for a kind of statistical tutorial, which is OT here.
Just so you are aware...

R has a special interest group (SIG) for phylgenetics at
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo  . I think this
would be a better place for you to post, as relevant expertise should
be found there. However, I do not know how active that list is, so
maybe not. Good luck.

Cheers,
Bert

On Tue, Jul 16, 2024 at 3:10 PM Roland Sookias  wrote:
>
> Dear all
>
> I have undertaken some phylogenetic and non-phylogenetic regressions with
> gls() in nlme with single preictor variables. A p value is associated with
> the intercept (upper p value) and another with the predictor variable
> (lower). Which p value is important? What does it mean if the intercept p
> value is insignificant but the predictor is still significant?
>
> Thanks a lot, and sorry for my ignorance,
>
> Roland
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Interpreting p values of gls in nlme

2024-07-16 Thread Roland Sookias
Dear all

I have undertaken some phylogenetic and non-phylogenetic regressions with
gls() in nlme with single preictor variables. A p value is associated with
the intercept (upper p value) and another with the predictor variable
(lower). Which p value is important? What does it mean if the intercept p
value is insignificant but the predictor is still significant?

Thanks a lot, and sorry for my ignorance,

Roland

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [ESS] accent marks on windows

2024-07-16 Thread Toby Hocking via ESS-help
A partial work-around is to change windows language to English, after
which R does not try to display accent marks (but still hangs if you
type é in the *R* buffer).
This seems to be an issue only on windows, as I can not reproduce them
on linux/ubuntu.

On Tue, Jul 16, 2024 at 11:54 AM Toby Hocking  wrote:
>
> hi, I am installing ESS on a new work computer, which is in Quebec
> (French speaking province of Canada).
> I am having issues with display and entry of accent marks, for example
> e with accent aigu é
> For example when R starts up I see é instead of é
>
> Copyright (C) 2024 The R Foundation for Statistical Computing
> Platform: x86_64-w64-mingw32/x64
>
> R est un logiciel libre livré sans AUCUNE GARANTIE.
> Vous pouvez le redistribuer sous certaines conditions.
> Tapez 'license()' ou 'licence()' pour plus de détails.
>
> R est un projet collaboratif avec de nombreux contributeurs.
> Tapez 'contributors()' pour plus d'information et
> 'citation()' pour la façon de le citer dans les publications.
>
> Tapez 'demo()' pour des démonstrations, 'help()' pour l'aide
> en ligne ou 'help.start()' pour obtenir l'aide au format HTML.
> Tapez 'q()' pour quitter R.
>
> Do I need to tell emacs to use a particular character encoding for the
> ESS buffer? How?
> I expected this should "just work" (display the right character é)
>
> More problematic, when I try to enter é in the *R* buffer, it
> freezes/hangs the whole emacs window. C-g does not help/cancel, only
> way to get out is to tell windows to kill emacs.
>
> Any ideas?
>
> Below is the environment from my eshell, if that helps.
> Thanks in advance!
> Toby
>
> c:/Program Files/Emacs/emacs-29.4/bin $ env
> ALLUSERSPROFILE=C:\ProgramData
> APPDATA=C:\Users\hoct2726\AppData\Roaming
> COLUMNS=80
> COMPUTERNAME=DINF-THOCK-01A
> COMSPEC=C:\windows\system32\cmd.exe
> CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
> CommonProgramFiles=C:\Program Files\Common Files
> CommonProgramW6432=C:\Program Files\Common Files
> DEFLOGDIR=C:\ProgramData\McAfee\Endpoint Security\Logs
> DriverData=C:\Windows\System32\Drivers\DriverData
> EFC_13380=1
> HOME=C:\Users\hoct2726
> HOMEDRIVE=C:
> HOMEPATH=\Users\hoct2726
> INSIDE_EMACS=29.4,eshell
> LINES=58
> LOCALAPPDATA=C:\Users\hoct2726\AppData\Local
> LOG4J_FORMAT_MSG_NO_LOOKUPS=true
> LOGONSERVER=\\SAD
> NUMBER_OF_PROCESSORS=6
> OLDPWD=
> OS=Windows_NT
> OneDrive=C:\Users\hoct2726\OneDrive
> PATH=c:/windows/system32;C:/windows;C:/windows/System32/Wbem;C:/windows/System32/WindowsPowerShell/v1.0/;C:/windows/System32/OpenSSH/;C:/Program
> Files/Git/cmd;C:/Program
> Files/R/R-4.4.1/bin/x64;C:/Users/hoct2726/AppData/Local/Microsoft/WindowsApps;.
> PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
> PROCESSOR_ARCHITECTURE=AMD64
> PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
> PROCESSOR_LEVEL=6
> PROCESSOR_REVISION=9e0a
> PSModulePath=C:\Program
> Files\WindowsPowerShell\Modules;C:\windows\system32\WindowsPowerShell\v1.0\Modules
> PUBLIC=C:\Users\Public
> PWD=c:/Program Files/Emacs/emacs-29.4/bin
> ProgramData=C:\ProgramData
> ProgramFiles(x86)=C:\Program Files (x86)
> ProgramFiles=C:\Program Files
> ProgramW6432=C:\Program Files
> RTOOLS44_HOME=C:\rtools44
> SESSIONNAME=Console
> SystemDrive=C:
> SystemRoot=C:\windows
> TEMP=C:\Users\hoct2726\AppData\Local\Temp
> TERM=dumb
> TMP=C:\Users\hoct2726\AppData\Local\Temp
> USERDNSDOMAIN=USHERBROOKE.CA
> USERDOMAIN=USHERBROOKE
> USERDOMAIN_ROAMINGPROFILE=USHERBROOKE
> USERNAME=hoct2726
> USERPROFILE=C:\Users\hoct2726
> WSUSUpgVer=11
> WSUStarget=FSCI;DInf
> ZES_ENABLE_SYSMAN=1
> windir=C:\windows

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


[ESS] accent marks on windows

2024-07-16 Thread Toby Hocking via ESS-help
hi, I am installing ESS on a new work computer, which is in Quebec
(French speaking province of Canada).
I am having issues with display and entry of accent marks, for example
e with accent aigu é
For example when R starts up I see é instead of é

Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64

R est un logiciel libre livré sans AUCUNE GARANTIE.
Vous pouvez le redistribuer sous certaines conditions.
Tapez 'license()' ou 'licence()' pour plus de détails.

R est un projet collaboratif avec de nombreux contributeurs.
Tapez 'contributors()' pour plus d'information et
'citation()' pour la façon de le citer dans les publications.

Tapez 'demo()' pour des démonstrations, 'help()' pour l'aide
en ligne ou 'help.start()' pour obtenir l'aide au format HTML.
Tapez 'q()' pour quitter R.

Do I need to tell emacs to use a particular character encoding for the
ESS buffer? How?
I expected this should "just work" (display the right character é)

More problematic, when I try to enter é in the *R* buffer, it
freezes/hangs the whole emacs window. C-g does not help/cancel, only
way to get out is to tell windows to kill emacs.

Any ideas?

Below is the environment from my eshell, if that helps.
Thanks in advance!
Toby

c:/Program Files/Emacs/emacs-29.4/bin $ env
ALLUSERSPROFILE=C:\ProgramData
APPDATA=C:\Users\hoct2726\AppData\Roaming
COLUMNS=80
COMPUTERNAME=DINF-THOCK-01A
COMSPEC=C:\windows\system32\cmd.exe
CommonProgramFiles(x86)=C:\Program Files (x86)\Common Files
CommonProgramFiles=C:\Program Files\Common Files
CommonProgramW6432=C:\Program Files\Common Files
DEFLOGDIR=C:\ProgramData\McAfee\Endpoint Security\Logs
DriverData=C:\Windows\System32\Drivers\DriverData
EFC_13380=1
HOME=C:\Users\hoct2726
HOMEDRIVE=C:
HOMEPATH=\Users\hoct2726
INSIDE_EMACS=29.4,eshell
LINES=58
LOCALAPPDATA=C:\Users\hoct2726\AppData\Local
LOG4J_FORMAT_MSG_NO_LOOKUPS=true
LOGONSERVER=\\SAD
NUMBER_OF_PROCESSORS=6
OLDPWD=
OS=Windows_NT
OneDrive=C:\Users\hoct2726\OneDrive
PATH=c:/windows/system32;C:/windows;C:/windows/System32/Wbem;C:/windows/System32/WindowsPowerShell/v1.0/;C:/windows/System32/OpenSSH/;C:/Program
Files/Git/cmd;C:/Program
Files/R/R-4.4.1/bin/x64;C:/Users/hoct2726/AppData/Local/Microsoft/WindowsApps;.
PATHEXT=.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
PROCESSOR_ARCHITECTURE=AMD64
PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 158 Stepping 10, GenuineIntel
PROCESSOR_LEVEL=6
PROCESSOR_REVISION=9e0a
PSModulePath=C:\Program
Files\WindowsPowerShell\Modules;C:\windows\system32\WindowsPowerShell\v1.0\Modules
PUBLIC=C:\Users\Public
PWD=c:/Program Files/Emacs/emacs-29.4/bin
ProgramData=C:\ProgramData
ProgramFiles(x86)=C:\Program Files (x86)
ProgramFiles=C:\Program Files
ProgramW6432=C:\Program Files
RTOOLS44_HOME=C:\rtools44
SESSIONNAME=Console
SystemDrive=C:
SystemRoot=C:\windows
TEMP=C:\Users\hoct2726\AppData\Local\Temp
TERM=dumb
TMP=C:\Users\hoct2726\AppData\Local\Temp
USERDNSDOMAIN=USHERBROOKE.CA
USERDOMAIN=USHERBROOKE
USERDOMAIN_ROAMINGPROFILE=USHERBROOKE
USERNAME=hoct2726
USERPROFILE=C:\Users\hoct2726
WSUSUpgVer=11
WSUStarget=FSCI;DInf
ZES_ENABLE_SYSMAN=1
windir=C:\windows

__
ESS-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/ess-help


Re: [R] Automatic Knot selection in Piecewise linear splines

2024-07-16 Thread Anupam Tyagi
Thanks, Martin. This is very helpful.



On Tue, 16 Jul 2024 at 14:52, Martin Maechler 
wrote:

> > Anupam Tyagi
> > on Tue, 9 Jul 2024 16:16:43 +0530 writes:
>
> > How can I do automatic knot selection while fitting piecewise linear
> > splines to two variables x and y? Which package to use to do it
> simply? I
> > also want to visualize the splines (and the scatter plot) with a
> graph.
>
> > Anupam
>
> NB: linear splines, i.e. piecewise linear continuous functions.
> Given the knots, use  approx() or approxfun() however, the
> automatic knots selection does not happen in the base R packages.
>
> I'm sure there are several R packages doing this.
> The best such package in my opinion is "earth" which does a
> re-implementation (and extensive  *generalization*) of the
> famous  MARS algorithm of Friedman.
> ==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
>
> Note that their strengths and power is that  they do their work
> for multivariate x (MARS := Multivariate Adaptive Regression
> Splines), but indeed do work for the simple 1D case.
>
> In the following example, we always get 11 final knots,
> but I'm sure one can tweak the many tuning paramters of earth()
> to get more:
>
> ## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via
> earth() {using MARS}!
>
> x <- (0:800)/8
>
> f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
> curve(f(x), 0, 100, n = 1000, col=2, lwd=2)
>
> set.seed(11)
> y <- f(x) + 10*rnorm(x)
>
> m.sspl <- smooth.spline(x,y) # base line "standard smoother"
>
> require(earth)
> fm1 <- earth(x, y) # default settings
> summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
> ## Call: earth(x=x, y=y)
>
> ## y =
> ##   175.9612
> ##   -   10.6744 * pmax(0,  x -  4.625)
> ##   +  9.928496 * pmax(0,  x - 10.875)
> ##   -  5.940857 * pmax(0,  x -  20.25)
> ##   +  3.438948 * pmax(0,  x - 27.125)
> ##   -  3.828159 * pmax(0, 44 -  x)
> ##   +  4.207046 * pmax(0,  x - 44)
> ##   +  2.573822 * pmax(0,  x -   76.5)
> ##   -  10.99073 * pmax(0,  x - 87.125)
> ##   +  10.97592 * pmax(0,  x - 90.875)
> ##   +  9.331949 * pmax(0,  x - 94)
> ##   -   8.48575 * pmax(0,  x -   96.5)
>
> ## Selected 12 of 12 terms, and 1 of 1 predictors
> ## Termination condition: Reached nk 21
> ## Importance: x
> ## Number of terms at each degree of interaction: 1 11 (additive model)
> ## GCV 108.6592RSS 82109.44GRSq 0.861423RSq 0.86894
>
>
> fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
> summary(fm2)
> all.equal(fm1, fm2)# they are identical (apart from 'call'):
> fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive
> forward pass; *no* pruning
> ## still no change: fm3 "==" fm1
> all.equal(predict(fm1, xx), predict(fm3, xx))
>
> ## BTW: The chosen knots and coefficients are
> mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))
>
> ## Plots : fine grid for visualization: instead of   xx <- seq(x[1],
> x[length(x)], length.out = 1024)
> rnx <- extendrange(x) ## to extrapolate a bit
> xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))
>
> cbind(f = f(xx),
>   sspl = predict(m.sspl, xx)$y,
>   mars = predict(fm1, xx)) -> fits
>
> plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
> cols <- c(adjustcolor(2, 1/3),
>   adjustcolor(4, 2/3),
>   adjustcolor("orange4", 2/3))
> lwds <- c(3, 2, 2)
> matlines(xx, fits, col = cols, lwd = lwds, lty=1)
> legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
>col=cols, lwd=lwds, bty = "n")
> title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
> mtext(substitute(f(x) == FDEF, list(FDEF = body(f
>


-- 
Anupam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Automatic Knot selection in Piecewise linear splines

2024-07-16 Thread Martin Maechler
> Anupam Tyagi 
> on Tue, 9 Jul 2024 16:16:43 +0530 writes:

> How can I do automatic knot selection while fitting piecewise linear
> splines to two variables x and y? Which package to use to do it simply? I
> also want to visualize the splines (and the scatter plot) with a graph.

> Anupam

NB: linear splines, i.e. piecewise linear continuous functions.
Given the knots, use  approx() or approxfun() however, the
automatic knots selection does not happen in the base R packages.

I'm sure there are several R packages doing this.
The best such package in my opinion is "earth" which does a
re-implementation (and extensive  *generalization*) of the
famous  MARS algorithm of Friedman.
==> https://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines

Note that their strengths and power is that  they do their work
for multivariate x (MARS := Multivariate Adaptive Regression
Splines), but indeed do work for the simple 1D case.

In the following example, we always get 11 final knots,
but I'm sure one can tweak the many tuning paramters of earth()
to get more:

## Can we do  knot-selection  for simple (x,y) splines?  === Yes, via  earth() 
{using MARS}!

x <- (0:800)/8

f <- function(x) 7 * sin(pi/8*x) * abs((x-50)/20)^1.25 - (x-40)*(12-x)/64
curve(f(x), 0, 100, n = 1000, col=2, lwd=2)

set.seed(11)
y <- f(x) + 10*rnorm(x)

m.sspl <- smooth.spline(x,y) # base line "standard smoother"

require(earth)
fm1 <- earth(x, y) # default settings
summary(fm1, style = "pmax") #-- got  10 knots (x = 44 "used twice") below
## Call: earth(x=x, y=y)

## y =
##   175.9612
##   -   10.6744 * pmax(0,  x -  4.625)
##   +  9.928496 * pmax(0,  x - 10.875)
##   -  5.940857 * pmax(0,  x -  20.25)
##   +  3.438948 * pmax(0,  x - 27.125)
##   -  3.828159 * pmax(0, 44 -  x)
##   +  4.207046 * pmax(0,  x - 44)
##   +  2.573822 * pmax(0,  x -   76.5)
##   -  10.99073 * pmax(0,  x - 87.125)
##   +  10.97592 * pmax(0,  x - 90.875)
##   +  9.331949 * pmax(0,  x - 94)
##   -   8.48575 * pmax(0,  x -   96.5)

## Selected 12 of 12 terms, and 1 of 1 predictors
## Termination condition: Reached nk 21
## Importance: x
## Number of terms at each degree of interaction: 1 11 (additive model)
## GCV 108.6592RSS 82109.44GRSq 0.861423RSq 0.86894


fm2 <- earth(x, y, fast.k = 0) # (more extensive forward pass)
summary(fm2)
all.equal(fm1, fm2)# they are identical (apart from 'call'):
fm3 <- earth(x, y, fast.k = 0, pmethod = "none", trace = 3) # extensive forward 
pass; *no* pruning
## still no change: fm3 "==" fm1
all.equal(predict(fm1, xx), predict(fm3, xx))

## BTW: The chosen knots and coefficients are
mat <- with(fm1, cbind(dirs, cuts=c(cuts), coef = c(coefficients)))

## Plots : fine grid for visualization: instead of   xx <- seq(x[1], 
x[length(x)], length.out = 1024)
rnx <- extendrange(x) ## to extrapolate a bit
xx <- do.call(seq.int, c(rnx, list(length.out = 1200)))

cbind(f = f(xx),
  sspl = predict(m.sspl, xx)$y,
  mars = predict(fm1, xx)) -> fits

plot(x,y, xlim=rnx, cex = 1/4, col = adjustcolor(1, 1/2))
cols <- c(adjustcolor(2, 1/3),
  adjustcolor(4, 2/3),
  adjustcolor("orange4", 2/3))
lwds <- c(3, 2, 2)
matlines(xx, fits, col = cols, lwd = lwds, lty=1)
legend("topleft", c("true f(x)", "smooth.spline()", "earth()"),
   col=cols, lwd=lwds, bty = "n")
title(paste("earth() linear spline vs. smooth.spline();  n =", length(x)))
mtext(substitute(f(x) == FDEF, list(FDEF = body(f

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reticulate + virtual environments

2024-07-15 Thread Ivan Krylov via R-help
В Mon, 15 Jul 2024 07:56:17 +0200
Sigbert Klinke  пишет:

>  > py_config()  

>  > use_virtualenv("mmstat4.hu.data", required = TRUE)  

Does it help _not_ to call py_config() before use_virtualenv()?

help(py_config) says that it forces the initialization of Python. When
you later try to ask for a different virtual environment, no conflict
is detected because
normalizePath('/home/sk/.virtualenvs/r-reticulate/bin/python') is
identical to
normalizePath('/home/sk/.virtualenvs/mmstat4.hu.data/bin/python'): they
must be both symlinks to /usr/bin/python3, so reticulate is likely
thinking that it's the same Python. Thus Python is not initialised
again, but you also don't see an error.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reticulate + virtual environments

2024-07-15 Thread Sigbert Klinke

Hi,

thanks, I posted in the Posit community.

Sigbert

Am 15.07.24 um 16:41 schrieb Bert Gunter:

Have you tried https://rstudio.github.io/reticulate/  ?

Generally speaking, complex nonstandard package specific questions
such as yours rarely get a reply here -- there are 20,000+ packages
(and counting) after all! As reticulate was created by and integrated
with RStudio/Posit, I would think their site and help resources might
be a better venue. Of course, if you don't use RStudio, you may have
no joy there either.

Cheers,
Bert




On Sun, Jul 14, 2024 at 10:56 PM Sigbert Klinke
 wrote:


Hi,

I am using reticulate and a virtual environment (not conda) to run
Python scripts from RStudio. However, when I try to use my own
(existing) virtual environment, reticulate does not use it. If I run my
scripts, the installed modules (e.g., py_install("pandas",
"mmstat4.hu.data")) are not found. I believe this happens because
reticulate is using r-reticulate instead of mmstat4.hu.data. How can I
force reticulate to use my virtual environment?

Thanks Sigbert

  > library("reticulate")

  > py_config()

python: /home/sk/.virtualenvs/r-reticulate/bin/python

libpython:
/usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so

pythonhome:
/home/sk/.virtualenvs/r-reticulate:/home/sk/.virtualenvs/r-reticulate

version:3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

numpy:
/home/sk/.virtualenvs/r-reticulate/lib/python3.10/site-packages/numpy

numpy_version:  2.0.0

  > use_virtualenv("mmstat4.hu.data", required = TRUE)

  > py_config()

python: /home/sk/.virtualenvs/r-reticulate/bin/python

libpython:
/usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so

pythonhome:
/home/sk/.virtualenvs/r-reticulate:/home/sk/.virtualenvs/r-reticulate

version:3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

numpy:
/home/sk/.virtualenvs/r-reticulate/lib/python3.10/site-packages/numpy

numpy_version:  2.0.0

--
https://hu.berlin/sk
https://hu.berlin/mmstat

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
https://hu.berlin/sk
https://hu.berlin/mmstat

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reticulate + virtual environments

2024-07-15 Thread Bert Gunter
Have you tried https://rstudio.github.io/reticulate/  ?

Generally speaking, complex nonstandard package specific questions
such as yours rarely get a reply here -- there are 20,000+ packages
(and counting) after all! As reticulate was created by and integrated
with RStudio/Posit, I would think their site and help resources might
be a better venue. Of course, if you don't use RStudio, you may have
no joy there either.

Cheers,
Bert




On Sun, Jul 14, 2024 at 10:56 PM Sigbert Klinke
 wrote:
>
> Hi,
>
> I am using reticulate and a virtual environment (not conda) to run
> Python scripts from RStudio. However, when I try to use my own
> (existing) virtual environment, reticulate does not use it. If I run my
> scripts, the installed modules (e.g., py_install("pandas",
> "mmstat4.hu.data")) are not found. I believe this happens because
> reticulate is using r-reticulate instead of mmstat4.hu.data. How can I
> force reticulate to use my virtual environment?
>
> Thanks Sigbert
>
>  > library("reticulate")
>
>  > py_config()
>
> python: /home/sk/.virtualenvs/r-reticulate/bin/python
>
> libpython:
> /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
>
> pythonhome:
> /home/sk/.virtualenvs/r-reticulate:/home/sk/.virtualenvs/r-reticulate
>
> version:3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
>
> numpy:
> /home/sk/.virtualenvs/r-reticulate/lib/python3.10/site-packages/numpy
>
> numpy_version:  2.0.0
>
>  > use_virtualenv("mmstat4.hu.data", required = TRUE)
>
>  > py_config()
>
> python: /home/sk/.virtualenvs/r-reticulate/bin/python
>
> libpython:
> /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
>
> pythonhome:
> /home/sk/.virtualenvs/r-reticulate:/home/sk/.virtualenvs/r-reticulate
>
> version:3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]
>
> numpy:
> /home/sk/.virtualenvs/r-reticulate/lib/python3.10/site-packages/numpy
>
> numpy_version:  2.0.0
>
> --
> https://hu.berlin/sk
> https://hu.berlin/mmstat
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reticulate + virtual environments

2024-07-14 Thread Sigbert Klinke

Hi,

I am using reticulate and a virtual environment (not conda) to run 
Python scripts from RStudio. However, when I try to use my own 
(existing) virtual environment, reticulate does not use it. If I run my 
scripts, the installed modules (e.g., py_install("pandas", 
"mmstat4.hu.data")) are not found. I believe this happens because 
reticulate is using r-reticulate instead of mmstat4.hu.data. How can I 
force reticulate to use my virtual environment?


Thanks Sigbert

> library("reticulate")

> py_config()

python: /home/sk/.virtualenvs/r-reticulate/bin/python

libpython: 
/usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so


pythonhome: 
/home/sk/.virtualenvs/r-reticulate:/home/sk/.virtualenvs/r-reticulate


version:3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

numpy: 
/home/sk/.virtualenvs/r-reticulate/lib/python3.10/site-packages/numpy


numpy_version:  2.0.0

> use_virtualenv("mmstat4.hu.data", required = TRUE)

> py_config()

python: /home/sk/.virtualenvs/r-reticulate/bin/python

libpython: 
/usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so


pythonhome: 
/home/sk/.virtualenvs/r-reticulate:/home/sk/.virtualenvs/r-reticulate


version:3.10.12 (main, Mar 22 2024, 16:50:05) [GCC 11.4.0]

numpy: 
/home/sk/.virtualenvs/r-reticulate/lib/python3.10/site-packages/numpy


numpy_version:  2.0.0

--
https://hu.berlin/sk
https://hu.berlin/mmstat

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Ps: Reinterpret data without saving it to a file 1st? Check for integer stopping at 1st decimal?

2024-07-14 Thread DynV Montrealer
The answer:
https://statisticsglobe.com/change-classes-data-frame-columns-automatically-r

On Sun, Jul 14, 2024 at 3:16 AM DynV Montrealer  wrote:

> A small number of columns in the data I need to work with are strings, the
> rest numbers.  I'm using read_excel() from the readxl package to get the
> data ; right after it, the string columns are of type chr and the rest num.
> I'm tasked with finding out which columns are integers. From an advice, I
> tried saving the spreadsheet content into a CSV then loading that, which
> works like a charm ; the chr columns are the same but now a large portion
> of num is now instead int. Is there a way to skip writing and reading a CSV
> and get the same transformation? Perhaps some way to break the spreadsheet
> data (eg XLdata <- read_excel(...)), then put it back together without any
> writing to a file (eg XLdataReformed <- reform(XLdata)) ?
>
> In addition, from is.integer() documentation I ran
>
> > is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x -
> round
> > (x)) < tol
>
> and I'm now trying to have it stop at the 1st decimal content of a column.
> Someone advised me to use break and I scripted
>
> > is_integer = TRUE for (current_row in seq_along(data$column)) { if (!
> > is.wholenumber(data$column[current_row])) { is_integer = FALSE break; } }
>
> but I'm wondering if there's something better to check if a column is
> entirely made of integers.
>
> Thank you kindly for your help
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reinterpret data without saving it to a file 1st? Check for integer stopping at 1st decimal?

2024-07-14 Thread Ivan Krylov via R-help
В Sun, 14 Jul 2024 03:16:56 -0400
DynV Montrealer  пишет:

> Perhaps some way to break the spreadsheet data (eg XLdata <-
> read_excel(...)), then put it back together without any writing to a
> file (eg XLdataReformed <- reform(XLdata)) ?

read_excel() is documented to return objects of class tibble:
https://cran.r-project.org/package=tibble/vignettes/tibble.html

Long story short, tibbles are named lists of columns, so it should be
possible for you to access and replace the individual parts of them
using the standard list subset syntax XLdata[[columnname]].

Lists are described in R Intro chapter 6 and many other books on R:
https://cran.r-project.org/doc/manuals/R-intro.html#Lists-and-data-frames
http://web.archive.org/web/20230415001551if_/http://ashipunov.info/shipunov/school/biol_240/en/visual_statistics.pdf
(see section 3.8.2 on page 93 and following)

> In addition, from is.integer() documentation I ran
> 
> > is.wholenumber <- function(x, tol = .Machine$double.eps^0.5) abs(x
> > - round (x)) < tol  
> 
> and I'm now trying to have it stop at the 1st decimal content of a
> column.

If you'd like to write idiomatic R code, consider the fact that
is.wholenumber is vectorised:

is.wholenumber(c(1,2,3,pi))
# [1]  TRUE  TRUE  TRUE FALSE

Given a vector of numbers, it will return a vector of the same length
specifying whether each element can be considered a whole number.
Combine it with all() and you can test the whole column in two function
calls.

R also has a type.convert function that may be useful in this case:
https://search.r-project.org/R/refmans/utils/html/type.convert.html

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   3   4   5   6   7   8   9   10   >