subject:"Re\: \[R\] help"

Hi Kimmo,
The code you sent has worked for me. Thank you very much.
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Mon, Feb 5, 2024 at 7:40 AM Kimmo Elo  wrote:

> Hi,
>
> the command line with 'text' should be:
>
> text(-8,-8, expression(R^2 * " = 0.62,  r = 0.79, N = 161"), cex = 2 )
>
> Best,
>
> Kimmo
>
> su, 2024-02-04 kello 17:16 +0100, Jibrin Alhassan kirjoitti:
> > Here is the script I used to plot the graph indicating the text I
> > wanted to
> > insert. The line in the script that I have issues with is: text(-8,-
> > 8,
> > "R^2=  0.62",  r = 0.79, N = 161", cex = 2
> > R^2=  0.62 is not producing R squared = 0.62.
> > Thanks.
> > Sys.setenv( TZ="GMT" )
> > dt <- read.table("CLMXAPTY_sim", col.names = c("FDcli", "FDapt"))
> > FDcli=dt$FDcli
> > FDapt=dt$FDapt
> > setEPS()
> > postscript(file = "cliapt2.eps")
> > par(mar = c(4.3, 4.3, 1.3, 1.3), oma = c(1, 1, 1 , 1))
> > plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4, cex.main
> > = 0.8,
> > font.lab = 1.7, font.axis = 1.7,  col = "red",main = "Simultaneous
> > Events
> > at CLMX and APTY",ylab="CLMX",xlab="APTY")
> > text(-8,-8, "R^2=  0.62",  r = 0.79, N = 161", cex = 2 )
> > abline(lm(FDcli ~ FDapt, col="black"))
> > dev.off()
> > *Jibrin Adejoh Alhassan (Ph.D)*
> > Department of Physics and Astronomy,
> > University of Nigeria, Nsukka
> >
> >
> > On Sun, Feb 4, 2024 at 5:03 PM Jibrin Alhassan
> > 
> > wrote:
> >
> > > Hi Elo,
> > > It gave this error message:
> > > CR_plot2.R:14:37: unexpected string constant
> > > 13: plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4,
> > > cex.main =
> > > 0.8, font.lab = 1.7, font.axis = 1.7,  col = "red",main =
> > > "Simultaneous
> > > Events at CLMX and APTY",ylab="CLMX",xlab="APTY")
> > > 14: text(-8,-8, "expression(R^2*"=  0.62"),  r = 0.79, N = 161"
> > > ^
> > > *Jibrin Adejoh Alhassan (Ph.D)*
> > > Department of Physics and Astronomy,
> > > University of Nigeria, Nsukka
> > >
> > >
> > > On Sun, Feb 4, 2024 at 4:45 PM Jibrin Alhassan
> > > 
> > > wrote:
> > >
> > > > Thank you Zhao for the code. When I replotted the graph after
> > > > inserting
> > > > the code in my script, it gave me this error message without
> > > > plotting the
> > > > graph:
> > > > Warning message:
> > > > In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)
> > > > :
> > > >  extra argument ‘col’ will be disregarded.
> > > > My regards.
> > > > *Jibrin Adejoh Alhassan (Ph.D)*
> > > > Department of Physics and Astronomy,
> > > > University of Nigeria, Nsukka
> > > >
> > > >
> > > > On Sun, Feb 4, 2024 at 3:21 PM Jinsong Zhao 
> > > > wrote:
> > > >
> > > > > ?plotmath
> > > > >
> > > > > expression(R^2==0.62)
> > > > >
> > > > > On 2024/2/4 18:10, Jibrin Alhassan wrote:
> > > > > > I have done a scatter plot in R. I want to insert the
> > > > > > coefficient of
> > > > > > determination R^2 = 0.62 as a text in the plot. I have tried
> > > > > > to write
> > > > > R^2
> > > > > > but could not produce R2. I would appreciate it if someone
> > > > > > could help
> > > > > me
> > > > > > with the syntax. I have tried:  expression(paste("", R^2,"=",
> > > > > > 0.62)),
> > > > > but
> > > > > > it did not produce R squared, rather it gave me error
> > > > > > messages. Thanks.
> > > > > > Jibrin Alhassan
> > > > > > *Jibrin Adejoh Alhassan (Ph.D)*
> > > > > > Department of Physics and Astronomy,
> > > > > > University of Nigeria, Nsukka
> > > > > >
> > > > > >   [[alternative HTML version deleted]]
> > > > > >
> > > > > > __
> > > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> > > > > > see
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html
> > > > > > and provide commented, minimal, self-contained, reproducible
> > > > > > code.
> > > > >
> > > > > __
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> > > > > see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible
> > > > > code.
> > > > >
> > > >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.or

Re: [R] Help

2024-02-04 Thread Kimmo Elo

Hi,

the command line with 'text' should be:

text(-8,-8, expression(R^2 * " = 0.62,  r = 0.79, N = 161"), cex = 2 )

Best,

Kimmo

su, 2024-02-04 kello 17:16 +0100, Jibrin Alhassan kirjoitti:
> Here is the script I used to plot the graph indicating the text I
> wanted to
> insert. The line in the script that I have issues with is: text(-8,-
> 8,
> "R^2=  0.62",  r = 0.79, N = 161", cex = 2
> R^2=  0.62 is not producing R squared = 0.62.
> Thanks.
> Sys.setenv( TZ="GMT" )
> dt <- read.table("CLMXAPTY_sim", col.names = c("FDcli", "FDapt"))
> FDcli=dt$FDcli
> FDapt=dt$FDapt
> setEPS()
> postscript(file = "cliapt2.eps")
> par(mar = c(4.3, 4.3, 1.3, 1.3), oma = c(1, 1, 1 , 1))
> plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4, cex.main
> = 0.8,
> font.lab = 1.7, font.axis = 1.7,  col = "red",main = "Simultaneous
> Events
> at CLMX and APTY",ylab="CLMX",xlab="APTY")
> text(-8,-8, "R^2=  0.62",  r = 0.79, N = 161", cex = 2 )
> abline(lm(FDcli ~ FDapt, col="black"))
> dev.off()
> *Jibrin Adejoh Alhassan (Ph.D)*
> Department of Physics and Astronomy,
> University of Nigeria, Nsukka
> 
> 
> On Sun, Feb 4, 2024 at 5:03 PM Jibrin Alhassan
> 
> wrote:
> 
> > Hi Elo,
> > It gave this error message:
> > CR_plot2.R:14:37: unexpected string constant
> > 13: plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4,
> > cex.main =
> > 0.8, font.lab = 1.7, font.axis = 1.7,  col = "red",main =
> > "Simultaneous
> > Events at CLMX and APTY",ylab="CLMX",xlab="APTY")
> > 14: text(-8,-8, "expression(R^2*"=  0.62"),  r = 0.79, N = 161"
> >     ^
> > *Jibrin Adejoh Alhassan (Ph.D)*
> > Department of Physics and Astronomy,
> > University of Nigeria, Nsukka
> > 
> > 
> > On Sun, Feb 4, 2024 at 4:45 PM Jibrin Alhassan
> > 
> > wrote:
> > 
> > > Thank you Zhao for the code. When I replotted the graph after
> > > inserting
> > > the code in my script, it gave me this error message without
> > > plotting the
> > > graph:
> > > Warning message:
> > > In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...)
> > > :
> > >  extra argument ‘col’ will be disregarded.
> > > My regards.
> > > *Jibrin Adejoh Alhassan (Ph.D)*
> > > Department of Physics and Astronomy,
> > > University of Nigeria, Nsukka
> > > 
> > > 
> > > On Sun, Feb 4, 2024 at 3:21 PM Jinsong Zhao 
> > > wrote:
> > > 
> > > > ?plotmath
> > > > 
> > > > expression(R^2==0.62)
> > > > 
> > > > On 2024/2/4 18:10, Jibrin Alhassan wrote:
> > > > > I have done a scatter plot in R. I want to insert the
> > > > > coefficient of
> > > > > determination R^2 = 0.62 as a text in the plot. I have tried
> > > > > to write
> > > > R^2
> > > > > but could not produce R2. I would appreciate it if someone
> > > > > could help
> > > > me
> > > > > with the syntax. I have tried:  expression(paste("", R^2,"=",
> > > > > 0.62)),
> > > > but
> > > > > it did not produce R squared, rather it gave me error
> > > > > messages. Thanks.
> > > > > Jibrin Alhassan
> > > > > *Jibrin Adejoh Alhassan (Ph.D)*
> > > > > Department of Physics and Astronomy,
> > > > > University of Nigeria, Nsukka
> > > > > 
> > > > >   [[alternative HTML version deleted]]
> > > > > 
> > > > > __
> > > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> > > > > see
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > > and provide commented, minimal, self-contained, reproducible
> > > > > code.
> > > > 
> > > > __
> > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more,
> > > > see
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible
> > > > code.
> > > > 
> > > 
> 
> [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

Many thanks.

On Mon, Feb 5, 2024, 1:06 AM Rolf Turner  wrote:

>
> Please see fortunes::fortune(285).
>
> cheers,
>
> Rolf Turner
>
> --
> Honorary Research Fellow
> Department of Statistics
> University of Auckland
> Stats. Dep't. (secretaries) phone:
>  +64-9-373-7599 ext. 89622
> Home phone: +64-9-480-4619
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

2024-02-04 Thread Rolf Turner



Please see fortunes::fortune(285).

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

2024-02-04 Thread Martin Møller Skarbiniks Pedersen

On Sun, 4 Feb 2024 at 17:26, Jibrin Alhassan  wrote:
>
> Here is the script I used to plot the graph indicating the text I wanted to
> insert. The line in the script that I have issues with is: text(-8,-8,
> "R^2=  0.62",  r = 0.79, N = 161", cex = 2
> R^2=  0.62 is not producing R squared = 0.62.
> Thanks.

This works for me:

curve(dnorm, from=-3, to=3, main="Normal Distribution")
text(x=0, y=0.1, cex=1.5, expression(R^2 == 0.62))

if you are used to write expression using LaTeX math , then maybe you
like the latex2exp package:
curve(dnorm, from=-3, to=3, main="Normal Distribution")
text(0, 0.1, latex2exp::TeX("$R^2 = 0.62$"))

Regards
Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

Here is the script I used to plot the graph indicating the text I wanted to
insert. The line in the script that I have issues with is: text(-8,-8,
"R^2=  0.62",  r = 0.79, N = 161", cex = 2
R^2=  0.62 is not producing R squared = 0.62.
Thanks.
Sys.setenv( TZ="GMT" )
dt <- read.table("CLMXAPTY_sim", col.names = c("FDcli", "FDapt"))
FDcli=dt$FDcli
FDapt=dt$FDapt
setEPS()
postscript(file = "cliapt2.eps")
par(mar = c(4.3, 4.3, 1.3, 1.3), oma = c(1, 1, 1 , 1))
plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4, cex.main = 0.8,
font.lab = 1.7, font.axis = 1.7,  col = "red",main = "Simultaneous Events
at CLMX and APTY",ylab="CLMX",xlab="APTY")
text(-8,-8, "R^2=  0.62",  r = 0.79, N = 161", cex = 2 )
abline(lm(FDcli ~ FDapt, col="black"))
dev.off()
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Sun, Feb 4, 2024 at 5:03 PM Jibrin Alhassan 
wrote:

> Hi Elo,
> It gave this error message:
> CR_plot2.R:14:37: unexpected string constant
> 13: plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4, cex.main =
> 0.8, font.lab = 1.7, font.axis = 1.7,  col = "red",main = "Simultaneous
> Events at CLMX and APTY",ylab="CLMX",xlab="APTY")
> 14: text(-8,-8, "expression(R^2*"=  0.62"),  r = 0.79, N = 161"
> ^
> *Jibrin Adejoh Alhassan (Ph.D)*
> Department of Physics and Astronomy,
> University of Nigeria, Nsukka
>
>
> On Sun, Feb 4, 2024 at 4:45 PM Jibrin Alhassan 
> wrote:
>
>> Thank you Zhao for the code. When I replotted the graph after inserting
>> the code in my script, it gave me this error message without plotting the
>> graph:
>> Warning message:
>> In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
>>  extra argument ‘col’ will be disregarded.
>> My regards.
>> *Jibrin Adejoh Alhassan (Ph.D)*
>> Department of Physics and Astronomy,
>> University of Nigeria, Nsukka
>>
>>
>> On Sun, Feb 4, 2024 at 3:21 PM Jinsong Zhao  wrote:
>>
>>> ?plotmath
>>>
>>> expression(R^2==0.62)
>>>
>>> On 2024/2/4 18:10, Jibrin Alhassan wrote:
>>> > I have done a scatter plot in R. I want to insert the coefficient of
>>> > determination R^2 = 0.62 as a text in the plot. I have tried to write
>>> R^2
>>> > but could not produce R2. I would appreciate it if someone could help
>>> me
>>> > with the syntax. I have tried:  expression(paste("", R^2,"=", 0.62)),
>>> but
>>> > it did not produce R squared, rather it gave me error messages. Thanks.
>>> > Jibrin Alhassan
>>> > *Jibrin Adejoh Alhassan (Ph.D)*
>>> > Department of Physics and Astronomy,
>>> > University of Nigeria, Nsukka
>>> >
>>> >   [[alternative HTML version deleted]]
>>> >
>>> > __
>>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

Hi Elo,
It gave this error message:
CR_plot2.R:14:37: unexpected string constant
13: plot(FDapt,FDcli, pch = 16,  cex.lab = 1.6, cex.axis = 1.4, cex.main =
0.8, font.lab = 1.7, font.axis = 1.7,  col = "red",main = "Simultaneous
Events at CLMX and APTY",ylab="CLMX",xlab="APTY")
14: text(-8,-8, "expression(R^2*"=  0.62"),  r = 0.79, N = 161"
^
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka


On Sun, Feb 4, 2024 at 4:45 PM Jibrin Alhassan 
wrote:

> Thank you Zhao for the code. When I replotted the graph after inserting
> the code in my script, it gave me this error message without plotting the
> graph:
> Warning message:
> In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
>  extra argument ‘col’ will be disregarded.
> My regards.
> *Jibrin Adejoh Alhassan (Ph.D)*
> Department of Physics and Astronomy,
> University of Nigeria, Nsukka
>
>
> On Sun, Feb 4, 2024 at 3:21 PM Jinsong Zhao  wrote:
>
>> ?plotmath
>>
>> expression(R^2==0.62)
>>
>> On 2024/2/4 18:10, Jibrin Alhassan wrote:
>> > I have done a scatter plot in R. I want to insert the coefficient of
>> > determination R^2 = 0.62 as a text in the plot. I have tried to write
>> R^2
>> > but could not produce R2. I would appreciate it if someone could help me
>> > with the syntax. I have tried:  expression(paste("", R^2,"=", 0.62)),
>> but
>> > it did not produce R squared, rather it gave me error messages. Thanks.
>> > Jibrin Alhassan
>> > *Jibrin Adejoh Alhassan (Ph.D)*
>> > Department of Physics and Astronomy,
>> > University of Nigeria, Nsukka
>> >
>> >   [[alternative HTML version deleted]]
>> >
>> > __
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

2024-02-04 Thread Kimmo Elo

Hi,

maybe this works:

expression(R^2 * "= 0.62")

HTH,

Kimmo

4. helmik. 2024, 16.11, Jibrin Alhassan 
mailto:jibrin.alhas...@unn.edu.ng>> kirjoitti:

I have done a scatter plot in R. I want to insert the coefficient of
determination R^2 = 0.62 as a text in the plot. I have tried to write R^2
but could not produce R2. I would appreciate it if someone could help me
with the syntax. I have tried:  expression(paste("", R^2,"=", 0.62)), but
it did not produce R squared, rather it gave me error messages. Thanks.
Jibrin Alhassan
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka

 [[alternative HTML version deleted]]



R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

Thank you Zhao for the code. When I replotted the graph after inserting the
code in my script, it gave me this error message without plotting the graph:
Warning message:
In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :
 extra argument ‘col’ will be disregarded.
My regards.
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka

On Sun, Feb 4, 2024 at 3:21 PM Jinsong Zhao  wrote:

> ?plotmath
>
> expression(R^2==0.62)
>
> On 2024/2/4 18:10, Jibrin Alhassan wrote:
> > I have done a scatter plot in R. I want to insert the coefficient of
> > determination R^2 = 0.62 as a text in the plot. I have tried to write R^2
> > but could not produce R2. I would appreciate it if someone could help me
> > with the syntax. I have tried:  expression(paste("", R^2,"=", 0.62)), but
> > it did not produce R squared, rather it gave me error messages. Thanks.
> > Jibrin Alhassan
> > *Jibrin Adejoh Alhassan (Ph.D)*
> > Department of Physics and Astronomy,
> > University of Nigeria, Nsukka
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help

2024-02-04 Thread Jinsong Zhao


?plotmath

expression(R^2==0.62)

On 2024/2/4 18:10, Jibrin Alhassan wrote:

I have done a scatter plot in R. I want to insert the coefficient of
determination R^2 = 0.62 as a text in the plot. I have tried to write R^2
but could not produce R2. I would appreciate it if someone could help me
with the syntax. I have tried:  expression(paste("", R^2,"=", 0.62)), but
it did not produce R squared, rather it gave me error messages. Thanks.
Jibrin Alhassan
*Jibrin Adejoh Alhassan (Ph.D)*
Department of Physics and Astronomy,
University of Nigeria, Nsukka

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2024-01-06 Thread Andy


Hi Tim

This is brilliant - thank you!!

I've had to tweak the basePath line a bit (I am on a Linux machine), but 
having done that, the code works as intended. This is a truly helpful 
contribution that gives me ideas about how to work it through for the 
missing fields, which is one of the major sticking points I kept bumping 
up against.


Thank you so much for this.

All the best
Andy

On 05/01/2024 13:59, Howard, Tim G (DEC) wrote:

Here's a simplified version of how I would do it, using `textreadr` but 
otherwise base functions. I haven't done it
all, but have a few examples of finding the correct row then extracting the 
right data.
I made a duplicate of the file you provided, so this loops through the two 
identical files, extracts a few parts,
then sticks those parts in a data frame.

#
library(textreadr)

# recommend not using setwd(), but instead just include the
# path as follows
basePath <- file.path("C:","temp")
files <- list.files(path=basePath, pattern = "docx$")

length(files)
# 2

# initialize a list to put the data in
myList <- vector(mode = "list", length = length(files))

for(i in 1:length(files)){
   fileDat <- read_docx(file.path(basePath, files[[i]]))
   # get the data you want, here one line per item to make it clearer
   # assume consistency among articles
   ttl <- fileDat[[1]]
   src <- fileDat[[2]]
   dt <- fileDat[[3]]
   aut <- fileDat[grepl("Byline:",fileDat)]
   aut <- trimws(sub("Byline:","",aut), whitespace = "[\\h\\v]")
   pg <- fileDat[grepl("Pg.",fileDat)]
   pg <- as.integer(sub(".*Pg. ([[:digit:]]+)","\\1",pg))
   len <- fileDat[grepl("Length:", fileDat)]
   len <- as.integer(sub("Length:.{1}([[:digit:]]+) .*","\\1",len))
   myList[[i]] <- data.frame("title"=ttl,
"source"=src,
"date"=dt,
"author"=aut,
"page"=pg,
"length"=len)
}

# roll up the list to a data frame. Many ways to do this.
myDF <- do.call("rbind",myList)

#

Hope that helps.
Tim




--

Date: Thu, 4 Jan 2024 12:59:59 +
From: Andy 
To: r-help@r-project.org
Subject: Re: [R]  Help request: Parsing docx files for key words and
 appending to a spreadsheet
Message-ID: 
Content-Type: text/plain; charset="utf-8"; Format="flowed"

Hi folks

Thanks for your help and suggestions - very much appreciated.

I now have some working code, using this file I uploaded for public
access:
https://docs/.
google.com%2Fdocument%2Fd%2F1QwuaWZk6tYlWQXJ3WLczxC8Cda6zVER
k%2Fedit%3Fusp%3Dsharing%26ouid%3D103065135255080058813%26rtpof%
3Dtrue%26sd%3Dtrue&data=05%7C02%7Ctim.howard%40dec.ny.gov%7C8f2
952a3ae474d4da14908dc0ddd95fd%7Cf46cb8ea79004d108ceb80e8c1c81ee7
%7C0%7C0%7C638400492578674983%7CUnknown%7CTWFpbGZsb3d8eyJWIj
oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3
000%7C%7C%7C&sdata=%2BpYrk6cJA%2BDUn9szLbd2Y7R%2F30UNY2TFSJN
HcwkHa9Y%3D&reserved=0


The small code segment that now works is as follows:

###

# Load libraries
library(textreadr)
library(tcltk)
library(tidyverse)
#library(officer)
#library(stringr) #for splitting and trimming raw data
#library(tidyr) #for converting to wide format

# I'd like to keep this as it enables more control over the selected directories
filepath <- setwd(tk_choose.dir())

# The following correctly lists the names of all 9 files in my test directory 
files
<- list.files(filepath, ".docx") files
length(files)

# Ideally, I'd like to skip this step by being able to automatically read in the
name of each file, but one step at a time:
filename <- "Now they want us to charge our electric cars from litter
bins.docx"

# This produces the file content as output when run, and identifies the fields
that I want to extract.
read_docx(filename) %>%
str_split(",") %>%
unlist() %>%
str_trim()

###

What I'd like to try and accomplish next is to extract the data from selected
fields and append to a spreadsheet (Calc or Excel) under specific columns, or
if it is easier to write a CSV which I can then use later.

The fields I want to extract are illustrated with reference to the above file,
viz.:

The title: "Now they want us to charge our electric cars from litter bins"
The name of the newspaper: "Mail on Sunday (London)"
The publication date: "September 24, 2023" (in date format, preferably
separated into month and year (day is not important)) The section: "NEWS"
The page number(s): "16" (as numeric)
The length: "515" (as numeric)
The author: "Anna Mikhailova"
The subject: from the Subject

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2024-01-04 Thread Andy


Hi folks

Thanks for your help and suggestions - very much appreciated.

I now have some working code, using this file I uploaded for public 
access: 
https://docs.google.com/document/d/1QwuaWZk6tYlWQXJ3WLczxC8Cda6zVERk/edit?usp=sharing&ouid=103065135255080058813&rtpof=true&sd=true 



The small code segment that now works is as follows:

###

# Load libraries
library(textreadr)
library(tcltk)
library(tidyverse)
#library(officer)
#library(stringr) #for splitting and trimming raw data
#library(tidyr) #for converting to wide format

# I'd like to keep this as it enables more control over the selected 
directories

filepath <- setwd(tk_choose.dir())

# The following correctly lists the names of all 9 files in my test 
directory

files <- list.files(filepath, ".docx")
files
length(files)

# Ideally, I'd like to skip this step by being able to automatically 
read in the name of each file, but one step at a time:
filename <- "Now they want us to charge our electric cars from litter 
bins.docx"


# This produces the file content as output when run, and identifies the 
fields that I want to extract.

read_docx(filename) %>%
  str_split(",") %>%
  unlist() %>%
  str_trim()

###

What I'd like to try and accomplish next is to extract the data from 
selected fields and append to a spreadsheet (Calc or Excel) under 
specific columns, or if it is easier to write a CSV which I can then use 
later.


The fields I want to extract are illustrated with reference to the above 
file, viz.:


The title: "Now they want us to charge our electric cars from litter bins"
The name of the newspaper: "Mail on Sunday (London)"
The publication date: "September 24, 2023" (in date format, preferably 
separated into month and year (day is not important))

The section: "NEWS"
The page number(s): "16" (as numeric)
The length: "515" (as numeric)
The author: "Anna Mikhailova"
The subject: from the Subject section, but this is to match a value e.g. 
GREENWASHING >= 50% (here this value is 51% so would be included). A 
match moves onto select the highest value under the section "Industry" 
(here it is ELECTRIC MOBILITY (91%)) and appends this text and % value. 
If no match with 'Greenwashing', then appends 'Null' and moves onto the 
next file in the directory.


###

The theory I am working with is if I can figure out how to extract these 
fields and append correctly, then the rest should just be wrapping this 
up in a for loop.


However, I am struggling to get my head around the extraction and append 
part. If I can get it to work for one of these fields, I suspect that I 
can repeat the basic syntax to extract and append the remaining fields.


Therefore, if someone can either suggest a syntax or point me to a 
useful tutorial, that would be splendid.


Thank you in anticipation.

Best wishes
Andy



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2024-01-03 Thread Dr Eberhard Lisse

If you do something like this

for i in  $(pandoc --list-output-formats);
do pandoc -f docx -t $i -o test.$i Now\ they\ want\ us\ to\ 
charge\
our\ electric\ cars\ from\ litter\ bins.docx;
done

you get approximately 65 formats, from which you can pick one which you can
write a little parser for. The dokuwiki one for example uses long lines
which
makes parsing easier.

el


On 2023-12-30 13:57 , Andy wrote:
> Good idea, El - thanks.
>
> The link is
> https://docs.google.com/document/d/1QwuaWZk6tYlWQXJ3WLczxC8Cda6zVERk/edit?usp=sharing&ouid=103065135255080058813&rtpof=true&sd=true
>
>  This is helpful.
>
> From the article, which is typical of Lexis+ output, I want to
> extract the following fields and append to a Calc/ Excel spreadsheet.
> Given the volume of articles I have to work through, if this can be
> iterative and semi-automatic, that would be a god send and I might be
> able to do some actual research on the articles before I reach my
> pensionable age. :-)
>
> Title Newspaper Date Section and page number Length Byline Subject
> (only if the threshold of coverage for a specific subject is
>> =50% is reached (e.g. Greenwashing (51%)) - if not, enter 'nil' and
>>
> move onto the next article in the folder
>
> This is the ambition. I am clearly a long way short of that though.
>
> Many thanks. Andy

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Ivan Krylov

В Sat, 30 Dec 2023 12:18:52 +
Andy  пишет:

> filepath <- setwd(tk_choose.dir())

Since you're using tcltk, you can get a file path in one step using
tk_choose.files(). (Use multi = FALSE to choose only one file.)

> full_filename <- paste(filepath, filename, sep="/")

There's also file.path(), which results in slightly more compact,
self-documenting code.

Nowadays, using '/' as the directory separator can be considered
portable, one notable exception being some Windows cmd.exe built-ins
(where '/' is interpreted as flag specifier). Perl5 documentation
mentions Classic MacOS using ':' as the directory separator (and many
other operating systems supporting or emulating Unix-style '/'
separators), but that hasn't been relevant for a long while.

> Error in x$doc_obj : $ operator is invalid for atomic vectors

Which line of code produces the error? What is the argument of
docx_summary() at this point?

Since you're learning R, I can recommend a couple of free books: Visual
Statistics [1] to study the basics of R and The R Inferno [2] for when
you get stuck.

-- 
Best regards,
Ivan

[1]
http://web.archive.org/web/20230415001551/http://ashipunov.info/shipunov/school/biol_240/en/visual_statistics.pdf

[2]
https://www.burns-stat.com/documents/books/the-r-inferno/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Eric Berger

Sorry, I was being too quick.
You have to pay attention to the pipe operator

You were advised to do the following

content <- read_docx(full_filename) |>
docx_summary()

which should have worked but I think you left out the |> operator.

Alternatively

tmp <- read_docx(full_filename)
content <-  docx_summary(tmp)



On Sat, Dec 30, 2023 at 2:37 PM Andy  wrote:

> An update: Running this block of code:
>
> # Load libraries
> library(tcltk)
> library(tidyverse)
> library(officer)
>
> filepath <- setwd(tk_choose.dir())
>
> filename <- "Now they want us to charge our electric cars from litter
> bins.docx"
>
> #full_filename <- paste0(filepath, filename)
> full_filename <- paste(filepath, filename, sep="/")
>
> if (!file.exists(full_filename)) {
>message("File missing")
> } else {
>content <- read_docx(full_filename) |>
>  docx_summary()
># this reads docx for the full filename and
># passes it ( |> command) to the next line
># which summarises it.
># the result is saved in a data frame object
># called content which we shall show some
># heading into from
>
>head(content)
> }
>
>
> Results in this error now:Error in x$doc_obj : $ operator is invalid for
> atomic vectors
>
> Thank you.
>
>
>
> On 30/12/2023 12:12, Andy wrote:
> > Hi Eric
> >
> > Thanks for that. That seems to fix one problem (the lack of a
> > separator), but introduces a new one when I complete the function
> > Calum proposed:Error in docx_summary() : argument "x" is missing, with
> > no default
> >
> > The whole code so far looks like this:
> >
> >
> > # Load libraries
> > library(tcltk)
> > library(tidyverse)
> > library(officer)
> >
> > filepath <- setwd(tk_choose.dir())
> >
> > filename <- "Now they want us to charge our electric cars from litter
> > bins.docx"
> > #full_filename <- paste0(filepath, filename) # Calum's original
> suggestion
> >
> > full_filename <- paste(filepath, filename, sep="/") # Eric's proposed fix
> >
> > #lets double check the file does exist! # The rest here is Calum's
> > suggestion
> > if (!file.exists(full_filename)) {
> >   message("File missing")
> > } else {
> >   content <- read_docx(full_filename)
> >   docx_summary()
> >   # this reads docx for the full filename and
> >   # passes it ( |> command) to the next line
> >   # which summarises it.
> >   # the result is saved in a data frame object
> >   # called content which we shall show some
> >   # heading into from
> >
> >   head(content)
> > }
> >
> >
> > Running this, results in the error cited above.
> >
> > Thanks as always :-)
> >
> >
> >
> >
> > On 30/12/2023 11:58, Eric Berger wrote:
> >> full_filename <- paste(filepath, filename,sep="/")
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

An update: Running this block of code:

# Load libraries
library(tcltk)
library(tidyverse)
library(officer)

filepath <- setwd(tk_choose.dir())

filename <- "Now they want us to charge our electric cars from litter 
bins.docx"

#full_filename <- paste0(filepath, filename)
full_filename <- paste(filepath, filename, sep="/")

if (!file.exists(full_filename)) {
   message("File missing")
} else {
   content <- read_docx(full_filename) |>
     docx_summary()
   # this reads docx for the full filename and
   # passes it ( |> command) to the next line
   # which summarises it.
   # the result is saved in a data frame object
   # called content which we shall show some
   # heading into from

   head(content)
}


Results in this error now:Error in x$doc_obj : $ operator is invalid for 
atomic vectors

Thank you.



On 30/12/2023 12:12, Andy wrote:
> Hi Eric
>
> Thanks for that. That seems to fix one problem (the lack of a 
> separator), but introduces a new one when I complete the function 
> Calum proposed:Error in docx_summary() : argument "x" is missing, with 
> no default
>
> The whole code so far looks like this:
>
>
> # Load libraries
> library(tcltk)
> library(tidyverse)
> library(officer)
>
> filepath <- setwd(tk_choose.dir())
>
> filename <- "Now they want us to charge our electric cars from litter 
> bins.docx"
> #full_filename <- paste0(filepath, filename) # Calum's original suggestion
>
> full_filename <- paste(filepath, filename, sep="/") # Eric's proposed fix
>
> #lets double check the file does exist! # The rest here is Calum's 
> suggestion
> if (!file.exists(full_filename)) {
>   message("File missing")
> } else {
>   content <- read_docx(full_filename)
>   docx_summary()
>   # this reads docx for the full filename and
>   # passes it ( |> command) to the next line
>   # which summarises it.
>   # the result is saved in a data frame object
>   # called content which we shall show some
>   # heading into from
>
>   head(content)
> }
>
>
> Running this, results in the error cited above.
>
> Thanks as always :-)
>
>
>
>
> On 30/12/2023 11:58, Eric Berger wrote:
>> full_filename <- paste(filepath, filename,sep="/")
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Eric Berger

docx_summary(content)

You should read documentation e.g. ?docx_summary and check the examples
section

On Sat, Dec 30, 2023 at 2:12 PM Andy  wrote:

> Hi Eric
>
> Thanks for that. That seems to fix one problem (the lack of a separator),
> but introduces a new one when I complete the function Calum proposed:
> Error in docx_summary() : argument "x" is missing, with no default
>
> The whole code so far looks like this:
>
>
> # Load libraries
> library(tcltk)
> library(tidyverse)
> library(officer)
>
> filepath <- setwd(tk_choose.dir())
>
> filename <- "Now they want us to charge our electric cars from litter
> bins.docx"
> #full_filename <- paste0(filepath, filename) # Calum's original suggestion
>
> full_filename <- paste(filepath, filename, sep="/") # Eric's proposed fix
>
> #lets double check the file does exist! # The rest here is Calum's
> suggestion
> if (!file.exists(full_filename)) {
>   message("File missing")
> } else {
>   content <- read_docx(full_filename)
>   docx_summary()
>   # this reads docx for the full filename and
>   # passes it ( |> command) to the next line
>   # which summarises it.
>   # the result is saved in a data frame object
>   # called content which we shall show some
>   # heading into from
>
>   head(content)
> }
>
>
> Running this, results in the error cited above.
>
> Thanks as always :-)
>
>
>
>
> On 30/12/2023 11:58, Eric Berger wrote:
>
> full_filename <- paste(filepath, filename,sep="/")
>
>
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

Hi Eric

Thanks for that. That seems to fix one problem (the lack of a 
separator), but introduces a new one when I complete the function Calum 
proposed:Error in docx_summary() : argument "x" is missing, with no default

The whole code so far looks like this:


# Load libraries
library(tcltk)
library(tidyverse)
library(officer)

filepath <- setwd(tk_choose.dir())

filename <- "Now they want us to charge our electric cars from litter 
bins.docx"
#full_filename <- paste0(filepath, filename) # Calum's original suggestion

full_filename <- paste(filepath, filename, sep="/") # Eric's proposed fix

#lets double check the file does exist! # The rest here is Calum's 
suggestion
if (!file.exists(full_filename)) {
   message("File missing")
} else {
   content <- read_docx(full_filename)
   docx_summary()
   # this reads docx for the full filename and
   # passes it ( |> command) to the next line
   # which summarises it.
   # the result is saved in a data frame object
   # called content which we shall show some
   # heading into from

   head(content)
}


Running this, results in the error cited above.

Thanks as always :-)




On 30/12/2023 11:58, Eric Berger wrote:
> full_filename <- paste(filepath, filename,sep="/")


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Eric Berger

full_filename <- paste(filepath, filename,sep="/")

On Sat, Dec 30, 2023 at 1:45 PM Andy  wrote:

> Thanks Ivan and Calum
>
> I continue to appreciate your support.
>
> Calum, I entered the code snippet you provided, and it returns 'file
> missing'. Looking at this, while the object 'full_filename' exists, what
> is happening is that the path from getwd() is being appended to the
> title of the article, but without the '/' between the end of the path
> name (here 'TEST' and the name of the article. In other words,
> full_filename is reading "~/TESTNow they want us to charge our electric
> cars from litter bins.docx", so logically, this file doesn't exist. To
> work, the '/' needs to be inserted to differentiate between the end of
> the path name and the start of the article name. I've tried both paste0,
> as you suggested, and paste but neither do the trick.
>
> Is this a result of me using the tkinter folder selection that you
> remarked on? I wanted to keep that so that the selection is interactive,
> but if there are better ways of doing this I am open to suggestions.
>
> Thanks again, both.
>
> Best wishes
> Andrew
>
>
> On 29/12/2023 22:25, CALUM POLWART wrote:
> >
> >
> > help(read_docx) says that the function only imports one docx file. In
> > order to read multiple files, use a for loop or the lapply function.
> >
> >
> > I told you people will suggest better ways to loop!!
> >
> >
> >
> > docx_summary(read_docx("Now they want us to charge our electric cars
> > from litter bins.docx")) should work.
> >
> >
> > Ivan thanks for spotting my fail! Since the OP is new to all this I'm
> > going to suggest a little tweak to this code which we can then build
> > into a for loop:
> >
> > filepath <- getwd() #you will want to change this later. You are doing
> > something with tcl to pick a directory which seems rather fancy! But
> > keep doing it for now or set the directory here ending in a /
> >
> > filename <- "Now they want us to charge our electric cars from litter
> > bins.docx"
> >
> > full_filename <- paste0(filepath, filename)
> >
> > #lets double check the file does exist!
> > if (!file.exists(full_filename)) {
> >   message("File missing")
> > } else {
> >   content <- read_docx(full_filename) |>
> > docx_summary()
> > # this reads docx for the full filename and
> > # passes it ( |> command) to the next line
> > # which summarises it.
> > # the result is saved in a data frame object
> > # called content which we shall show some
> > # heading into from
> >
> >head(content)
> > }
> >
> > Let's get this bit working before we try and loop
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

Good idea, El - thanks.

The link is
https://docs.google.com/document/d/1QwuaWZk6tYlWQXJ3WLczxC8Cda6zVERk/edit?usp=sharing&ouid=103065135255080058813&rtpof=true&sd=true

This is helpful.

From the article, which is typical of Lexis+ output, I want to extract
the following fields and append to a Calc/ Excel spreadsheet. Given the
volume of articles I have to work through, if this can be iterative and
semi-automatic, that would be a god send and I might be able to do some
actual research on the articles before I reach my pensionable age. :-)

Title
Newspaper
Date
Section and page number
Length
Byline
Subject (only if the threshold of coverage for a specific subject is
>=50% is reached (e.g. Greenwashing (51%)) - if not, enter 'nil' and
move onto the next article in the folder

This is the ambition. I am clearly a long way short of that though.

Many thanks.
Andy

On 30/12/2023 00:08, Dr Eberhard W Lisse wrote:

Andy,

you can always open a public Dropbox or Google folder and post the link.

On 29/12/2023 22:37, Andy wrote:

Thanks - I'll have a look at these options too.

I'm happy to send over a sample document, but wasn't aware if
attachments are allowed. The documents come Lexis+, so require user
credentials to log in, but I could upload the file somewhere if
that would help? Any ideas for a good location to do so?

[...]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Roy Mendelssohn - NOAA Federal via R-help

Thanks Ivan and Calum

I continue to appreciate your support.

Calum, I entered the code snippet you provided, and it returns 'file 
missing'. Looking at this, while the object 'full_filename' exists, what 
is happening is that the path from getwd() is being appended to the 
title of the article, but without the '/' between the end of the path 
name (here 'TEST' and the name of the article. In other words, 
full_filename is reading "~/TESTNow they want us to charge our electric 
cars from litter bins.docx", so logically, this file doesn't exist. To 
work, the '/' needs to be inserted to differentiate between the end of 
the path name and the start of the article name. I've tried both paste0, 
as you suggested, and paste but neither do the trick.

Is this a result of me using the tkinter folder selection that you 
remarked on? I wanted to keep that so that the selection is interactive, 
but if there are better ways of doing this I am open to suggestions.

Thanks again, both.

Best wishes
Andrew

On 29/12/2023 22:25, CALUM POLWART wrote:
>
>
> help(read_docx) says that the function only imports one docx file. In
> order to read multiple files, use a for loop or the lapply function.
>
>
> I told you people will suggest better ways to loop!!
>
>
>
> docx_summary(read_docx("Now they want us to charge our electric cars
> from litter bins.docx")) should work.
>
>
> Ivan thanks for spotting my fail! Since the OP is new to all this I'm 
> going to suggest a little tweak to this code which we can then build 
> into a for loop:
>
> filepath <- getwd() #you will want to change this later. You are doing 
> something with tcl to pick a directory which seems rather fancy! But 
> keep doing it for now or set the directory here ending in a /
>
> filename <- "Now they want us to charge our electric cars from litter 
> bins.docx"
>
> full_filename <- paste0(filepath, filename)
>
> #lets double check the file does exist!
> if (!file.exists(full_filename)) {
>   message("File missing")
> } else {
>   content <- read_docx(full_filename) |>
>     docx_summary()
>     # this reads docx for the full filename and
>     # passes it ( |> command) to the next line
>     # which summarises it.
>     # the result is saved in a data frame object
>     # called content which we shall show some
>     # heading into from
>
>    head(content)
> }
>
> Let's get this bit working before we try and loop
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Dr Eberhard W Lisse

Andy,

you can always open a public Dropbox or Google folder and post the link.

el

On 29/12/2023 22:37, Andy wrote:
> Thanks - I'll have a look at these options too.
>
> I'm happy to send over a sample document, but wasn't aware if
> attachments are allowed. The documents come Lexis+, so require user
>  credentials to log in, but I could upload the file somewhere if
> that would help? Any ideas for a good location to do so?
[...]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread CALUM POLWART

help(read_docx) says that the function only imports one docx file. In
> order to read multiple files, use a for loop or the lapply function.
>

I told you people will suggest better ways to loop!!


>
> docx_summary(read_docx("Now they want us to charge our electric cars
> from litter bins.docx")) should work.
>

Ivan thanks for spotting my fail! Since the OP is new to all this I'm going
to suggest a little tweak to this code which we can then build into a for
loop:

filepath <- getwd() #you will want to change this later. You are doing
something with tcl to pick a directory which seems rather fancy! But keep
doing it for now or set the directory here ending in a /

filename <- "Now they want us to charge our electric cars from litter
bins.docx"

full_filename <- paste0(filepath, filename)

#lets double check the file does exist!
if (!file.exists(full_filename)) {
  message("File missing")
} else {
  content <- read_docx(full_filename) |>
docx_summary()
# this reads docx for the full filename and
# passes it ( |> command) to the next line
# which summarises it.
# the result is saved in a data frame object
# called content which we shall show some
# heading into from

   head(content)
}

Let's get this bit working before we try and loop

>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Ivan Krylov

В Fri, 29 Dec 2023 20:17:41 +
Andy  пишет:

> doc_in <- read_docx(files)
> 
> Results in this error:Error in filetype %in% c("docx") && 
> grepl("^([fh]ttp)", file) :'length = 9' in coercion to 'logical(1)'

help(read_docx) says that the function only imports one docx file. In
order to read multiple files, use a for loop or the lapply function.

> content <- officer::docx_summary("Now they want us to charge our 
> electric cars from litter bins.docx") # A title of one of the articles
> 
> The error returned is:Error in x$doc_obj : $ operator is invalid for 
> atomic vectors

A similar problem here. help(docx_summary) says that the function
accepts "rdocx" objects returned by read_docx, not file paths. A string
in R is indeed an atomic vector of type character, length 1.

docx_summary(read_docx("Now they want us to charge our electric cars
from litter bins.docx")) should work.

-- 
Best regards,
Ivan

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Andy


Thanks - I'll have a look at these options too.

I'm happy to send over a sample document, but wasn't aware if 
attachments are allowed. The documents come Lexis+, so require user 
credentials to log in, but I could upload the file somewhere if that 
would help? Any ideas for a good location to do so?



On 29/12/2023 20:25, Dr Eberhard W Lisse wrote:

I would also look at https://pandoc.org perhaps which can
export a number of formats...

And for spreadsheets https://github.com/jqnatividad/qsv is my
goto weapon.  Can also read and write XLSX and others.

A sample document or two would always be helpful...

el

On 29/12/2023 21:01, CALUM POLWART wrote:

It sounded like he looked at officeR but I would agree

content <- officer::docx_summary("filename.docx")

Would get the text content into an object called content.

That object is a data.frame so you can then manipulate it.
To be more specific, we might need an example of the DF

[...]

On Fri, Dec 29, 2023 at 10:14 AM Andy 
wrote:

[...]

I'd like to be able to accomplish the following:

(1) Append the title, the month, the author, the number of
words, and page number(s) to a spreadsheet

(2) Read each article and extract keywords (in the docs,
these are listed in 'Subject' section as a list of
keywords with a percentage showing the extent to which the
keyword features in the article (e.g., FAST FASHION (72%))
and to append the keyword and the % coverage to the same
row in the spreadsheet.  However, I want to ensure that
the keyword coverage meets the threshold of >= 50%; if
not, then pass onto the next article in the directory.
Rinse and repeat for the entire directory.

[...]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Dr Eberhard W Lisse

I would also look at https://pandoc.org perhaps which can
export a number of formats...

And for spreadsheets https://github.com/jqnatividad/qsv is my
goto weapon.  Can also read and write XLSX and others.

A sample document or two would always be helpful...

el

On 29/12/2023 21:01, CALUM POLWART wrote:
> It sounded like he looked at officeR but I would agree
> 
> content <- officer::docx_summary("filename.docx")
> 
> Would get the text content into an object called content.
> 
> That object is a data.frame so you can then manipulate it.
> To be more specific, we might need an example of the DF
[...]
>> On Fri, Dec 29, 2023 at 10:14 AM Andy 
>> wrote:
[...]
>>> I'd like to be able to accomplish the following:
>>>
>>> (1) Append the title, the month, the author, the number of
>>> words, and page number(s) to a spreadsheet
>>>
>>> (2) Read each article and extract keywords (in the docs,
>>> these are listed in 'Subject' section as a list of
>>> keywords with a percentage showing the extent to which the
>>> keyword features in the article (e.g., FAST FASHION (72%))
>>> and to append the keyword and the % coverage to the same
>>> row in the spreadsheet.  However, I want to ensure that
>>> the keyword coverage meets the threshold of >= 50%; if
>>> not, then pass onto the next article in the directory.
>>> Rinse and repeat for the entire directory.
[...]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Andy

Hi Roy (& others)

Many thanks for the advice - well taken. Thanks also to the others who 
have responded so quickly - I thought I might have to wait days!! :-)

I'm on a Linux (Mint) machine. Below, I document three attempts, two 
using officer and the last now using textreadr

My attempts so far using 'officer':

##

(1) First Attempt:

# Load libraries
library(tcltk)
library(tidyverse)
library(officer)

setwd(tk_choose.dir())

doc_path <- list.files(getwd(), pattern = ".docx", full.names = TRUE)

files <- list.files(getwd(), ".docx")
files
length(files)

## This works to here - obtain a list of docx files in directory 'TEST 
with 9 files'. However, the next line
doc_in <- read_docx(files)

Results in this error:Error in filetype %in% c("docx") && 
grepl("^([fh]ttp)", file) :'length = 9' in coercion to 'logical(1)'

No idea how to debug that.

Even when trying Calum's suggestion with officer:

content <- officer::docx_summary("Now they want us to charge our 
electric cars from litter bins.docx") # A title of one of the articles

The error returned is:Error in x$doc_obj : $ operator is invalid for 
atomic vectors


##
(2) Second Attempt:

# Load libraries
library(tcltk)
library(tidyverse)
library(officer)

setwd(tk_choose.dir())

doc_path <- list.files(getwd(), pattern = ".docx", full.names = TRUE)

files <- list.files(getwd(), ".docx")
files
length(files)

docx_summary(doc_path, preserve = FALSE)
## At this point, the error is:Error in x$doc_obj : $ operator is 
invalid for atomic vectors

So, not sure how I am passing an atomic vector or if there is something 
I am supposed to set to make this something else?

##
(3) Third attempt - now trying with textreadr (Thanks for the help on 
installing this, Calum):

# Load libraries
library(tcltk)
library(tidyverse)
library(textreadr)

folder <- setwd(tk_choose.dir())

files <- list.files(folder, ".docx")
files
length(files)

doc <- read_docx("Now they want us to charge our electric cars from 
litter bins.docx") # One of the 9 files in the folder

read_docx(doc, skip = 0, remove.empty = TRUE, trim = TRUE) # To test 
against one file

## The last line returns the following error:Error in filetype %in% 
c("docx") && grepl("^([fh]ttp)", file) :'length = 38' in coercion to 
'logical(1)'

##
And so I am going around in circles and not at all clear on how I can 
make progress.

I am sure that there must be a way, but the suggestions on-line each 
lead to the above errors.

Thanks for any further help.

Best wishes, and thanks
Andy


On 29/12/2023 18:25, Roy Mendelssohn - NOAA Federal wrote:
> Hi Andy:
>
> I don’t have an answer but I do have what I hope is some friendly advice.  
> Generally the more information you can provide,  the more likely you will get 
> help that is useful.  In your case you say that you tried several packages 
> and they didn’t do what you wanted.  Providing that code,  as well as why 
> they didn’t do what you wanted (be specific)  would greatly facilitate things.
>
> Happy new year,
>
> -Roy
>
>
>> On Dec 29, 2023, at 10:14 AM, Andy  wrote:
>>
>> Hello
>>
>> I am trying to work through a problem, but feel like I've gone down a rabbit 
>> hole. I'd very much appreciate any help.
>>
>> The task: I have several directories of multiple (some directories, up to 
>> 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that I want 
>> to iterate through to append to a spreadsheet only those articles that 
>> satisfy a condition (i.e., a specific keyword is present for >= 50% coverage 
>> of the subject matter). Lexis+ has a very specific structure and keywords 
>> are given in the row "Subject".
>>
>> I'd like to be able to accomplish the following:
>>
>> (1) Append the title, the month, the author, the number of words, and page 
>> number(s) to a spreadsheet
>>
>> (2) Read each article and extract keywords (in the docs, these are listed in 
>> 'Subject' section as a list of keywords with a percentage showing the extent 
>> to which the keyword features in the article (e.g., FAST FASHION (72%)) and 
>> to append the keyword and the % coverage to the same row in the spreadsheet. 
>> However, I want to ensure that the keyword coverage meets the threshold of 
>> >= 50%; if not, then pass onto the next article in the directory. Rinse and 
>> repeat for the entire directory.
>>
>> So far, I've tried working through some Stack Overflow-based solutions, but 
>> most seem to use the textreadr package, which is now deprecated; others use 
>> either the officer or the officedown packages. However, these packages don't 
>> appear to do what I want the program to do, at least not in any of the 
>> examples I have found, nor in the vignettes and relevant package manuals 
>> I've looked at.
>>
>> The first point is, is what I am intending to do even possible using R? If 
>> it is, then where do I start with this? If these docx files were converted 
>> to UTF-8 plain text, would that ma

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread CALUM POLWART

It sounded like he looked at officeR but I would agree

content <- officer::docx_summary("filename.docx")

Would get the text content into an object called content.

That object is a data.frame so you can then manipulate it.  To be more
specific, we might need an example of the DF

You can loop this easily with a for statement although there are people who
prefer a non-for approach to iteration in R. For can be slow. But if you
don't need to do this very quickly I'd stick with for if you are used to
programming

On Fri, 29 Dec 2023, 18:35 jim holtman,  wrote:

> checkout the 'officer' package
>
> Thanks
>
> Jim Holtman
> *Data Munger Guru*
>
>
> *What is the problem that you are trying to solve?Tell me what you want to
> do, not how you want to do it.*
>
>
> On Fri, Dec 29, 2023 at 10:14 AM Andy  wrote:
>
> > Hello
> >
> > I am trying to work through a problem, but feel like I've gone down a
> > rabbit hole. I'd very much appreciate any help.
> >
> > The task: I have several directories of multiple (some directories, up
> > to 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that
> > I want to iterate through to append to a spreadsheet only those articles
> > that satisfy a condition (i.e., a specific keyword is present for >= 50%
> > coverage of the subject matter). Lexis+ has a very specific structure
> > and keywords are given in the row "Subject".
> >
> > I'd like to be able to accomplish the following:
> >
> > (1) Append the title, the month, the author, the number of words, and
> > page number(s) to a spreadsheet
> >
> > (2) Read each article and extract keywords (in the docs, these are
> > listed in 'Subject' section as a list of keywords with a percentage
> > showing the extent to which the keyword features in the article (e.g.,
> > FAST FASHION (72%)) and to append the keyword and the % coverage to the
> > same row in the spreadsheet. However, I want to ensure that the keyword
> > coverage meets the threshold of >= 50%; if not, then pass onto the next
> > article in the directory. Rinse and repeat for the entire directory.
> >
> > So far, I've tried working through some Stack Overflow-based solutions,
> > but most seem to use the textreadr package, which is now deprecated;
> > others use either the officer or the officedown packages. However, these
> > packages don't appear to do what I want the program to do, at least not
> > in any of the examples I have found, nor in the vignettes and relevant
> > package manuals I've looked at.
> >
> > The first point is, is what I am intending to do even possible using R?
> > If it is, then where do I start with this? If these docx files were
> > converted to UTF-8 plain text, would that make the task easier?
> >
> > I am not a confident coder, and am really only just getting my head
> > around R so appreciate a steep learning curve ahead, but of course, I
> > don't know what I don't know, so any pointers in the right direction
> > would be a big help.
> >
> > Many thanks in anticipation
> >
> > Andy
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread CALUM POLWART

textreadr would be the obvious approach.

When you say it is depreciated do you mean it's not available on cran?
Sometimes maintaining a package on cran in just a pain in the ass.

devtools::install_github("trinker/textreadr")


Should let you install it.

In theory docx files are actually just zip files (you can unzip them) and
you may find there is then a specific file in the zip that is readable with
on of R's General text file readers.

Alternatively, read_docx from:
https://www.rdocumentation.org/packages/qdapTools

May be worth a look.

What platform are you on. Certainly options to command line convert files
to txt and do from there.


On Fri, 29 Dec 2023, 18:25 Roy Mendelssohn - NOAA Federal via R-help, <
r-help@r-project.org> wrote:

> Hi Andy:
>
> I don’t have an answer but I do have what I hope is some friendly advice.
> Generally the more information you can provide,  the more likely you will
> get help that is useful.  In your case you say that you tried several
> packages and they didn’t do what you wanted.  Providing that code,  as well
> as why they didn’t do what you wanted (be specific)  would greatly
> facilitate things.
>
> Happy new year,
>
> -Roy
>
>
> > On Dec 29, 2023, at 10:14 AM, Andy  wrote:
> >
> > Hello
> >
> > I am trying to work through a problem, but feel like I've gone down a
> rabbit hole. I'd very much appreciate any help.
> >
> > The task: I have several directories of multiple (some directories, up
> to 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that I
> want to iterate through to append to a spreadsheet only those articles that
> satisfy a condition (i.e., a specific keyword is present for >= 50%
> coverage of the subject matter). Lexis+ has a very specific structure and
> keywords are given in the row "Subject".
> >
> > I'd like to be able to accomplish the following:
> >
> > (1) Append the title, the month, the author, the number of words, and
> page number(s) to a spreadsheet
> >
> > (2) Read each article and extract keywords (in the docs, these are
> listed in 'Subject' section as a list of keywords with a percentage showing
> the extent to which the keyword features in the article (e.g., FAST FASHION
> (72%)) and to append the keyword and the % coverage to the same row in the
> spreadsheet. However, I want to ensure that the keyword coverage meets the
> threshold of >= 50%; if not, then pass onto the next article in the
> directory. Rinse and repeat for the entire directory.
> >
> > So far, I've tried working through some Stack Overflow-based solutions,
> but most seem to use the textreadr package, which is now deprecated; others
> use either the officer or the officedown packages. However, these packages
> don't appear to do what I want the program to do, at least not in any of
> the examples I have found, nor in the vignettes and relevant package
> manuals I've looked at.
> >
> > The first point is, is what I am intending to do even possible using R?
> If it is, then where do I start with this? If these docx files were
> converted to UTF-8 plain text, would that make the task easier?
> >
> > I am not a confident coder, and am really only just getting my head
> around R so appreciate a steep learning curve ahead, but of course, I don't
> know what I don't know, so any pointers in the right direction would be a
> big help.
> >
> > Many thanks in anticipation
> >
> > Andy
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread jim holtman

checkout the 'officer' package

Thanks

Jim Holtman
*Data Munger Guru*


*What is the problem that you are trying to solve?Tell me what you want to
do, not how you want to do it.*


On Fri, Dec 29, 2023 at 10:14 AM Andy  wrote:

> Hello
>
> I am trying to work through a problem, but feel like I've gone down a
> rabbit hole. I'd very much appreciate any help.
>
> The task: I have several directories of multiple (some directories, up
> to 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that
> I want to iterate through to append to a spreadsheet only those articles
> that satisfy a condition (i.e., a specific keyword is present for >= 50%
> coverage of the subject matter). Lexis+ has a very specific structure
> and keywords are given in the row "Subject".
>
> I'd like to be able to accomplish the following:
>
> (1) Append the title, the month, the author, the number of words, and
> page number(s) to a spreadsheet
>
> (2) Read each article and extract keywords (in the docs, these are
> listed in 'Subject' section as a list of keywords with a percentage
> showing the extent to which the keyword features in the article (e.g.,
> FAST FASHION (72%)) and to append the keyword and the % coverage to the
> same row in the spreadsheet. However, I want to ensure that the keyword
> coverage meets the threshold of >= 50%; if not, then pass onto the next
> article in the directory. Rinse and repeat for the entire directory.
>
> So far, I've tried working through some Stack Overflow-based solutions,
> but most seem to use the textreadr package, which is now deprecated;
> others use either the officer or the officedown packages. However, these
> packages don't appear to do what I want the program to do, at least not
> in any of the examples I have found, nor in the vignettes and relevant
> package manuals I've looked at.
>
> The first point is, is what I am intending to do even possible using R?
> If it is, then where do I start with this? If these docx files were
> converted to UTF-8 plain text, would that make the task easier?
>
> I am not a confident coder, and am really only just getting my head
> around R so appreciate a steep learning curve ahead, but of course, I
> don't know what I don't know, so any pointers in the right direction
> would be a big help.
>
> Many thanks in anticipation
>
> Andy
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

Hi Andy:

I don’t have an answer but I do have what I hope is some friendly advice.  
Generally the more information you can provide,  the more likely you will get 
help that is useful.  In your case you say that you tried several packages and 
they didn’t do what you wanted.  Providing that code,  as well as why they 
didn’t do what you wanted (be specific)  would greatly facilitate things.

Happy new year,

-Roy


> On Dec 29, 2023, at 10:14 AM, Andy  wrote:
> 
> Hello
> 
> I am trying to work through a problem, but feel like I've gone down a rabbit 
> hole. I'd very much appreciate any help.
> 
> The task: I have several directories of multiple (some directories, up to 
> 2,500+) *.docx files (newspaper articles downloaded from Lexis+) that I want 
> to iterate through to append to a spreadsheet only those articles that 
> satisfy a condition (i.e., a specific keyword is present for >= 50% coverage 
> of the subject matter). Lexis+ has a very specific structure and keywords are 
> given in the row "Subject".
> 
> I'd like to be able to accomplish the following:
> 
> (1) Append the title, the month, the author, the number of words, and page 
> number(s) to a spreadsheet
> 
> (2) Read each article and extract keywords (in the docs, these are listed in 
> 'Subject' section as a list of keywords with a percentage showing the extent 
> to which the keyword features in the article (e.g., FAST FASHION (72%)) and 
> to append the keyword and the % coverage to the same row in the spreadsheet. 
> However, I want to ensure that the keyword coverage meets the threshold of >= 
> 50%; if not, then pass onto the next article in the directory. Rinse and 
> repeat for the entire directory.
> 
> So far, I've tried working through some Stack Overflow-based solutions, but 
> most seem to use the textreadr package, which is now deprecated; others use 
> either the officer or the officedown packages. However, these packages don't 
> appear to do what I want the program to do, at least not in any of the 
> examples I have found, nor in the vignettes and relevant package manuals I've 
> looked at.
> 
> The first point is, is what I am intending to do even possible using R? If it 
> is, then where do I start with this? If these docx files were converted to 
> UTF-8 plain text, would that make the task easier?
> 
> I am not a confident coder, and am really only just getting my head around R 
> so appreciate a steep learning curve ahead, but of course, I don't know what 
> I don't know, so any pointers in the right direction would be a big help.
> 
> Many thanks in anticipation
> 
> Andy
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with plotting and date-times for climate data

2023-09-15 Thread Martin Møller Skarbiniks Pedersen

Change

 geom_point(aes(y = tmax_mean, color = "blue"))
to
 geom_point(aes(y = tmax_mean), color = "blue")
if you want blue points.

aes(color = ) does not set the color of the points.

aes(color = ) takes a column (best if it is a factor) and uses that for
different colors.


/Martin

On Tue, Sep 12, 2023, 22:50 Kevin Zembower via R-help 
wrote:

> Hello,
>
> I'm trying to calculate the mean temperature max from a file of climate
> date, and plot it over a range of days in the year. I've downloaded the
> data, and cleaned it up the way I think it should be. However, when I
> plot it, the geom_smooth line doesn't show up. I think that's because
> my x axis is characters or factors. Here's what I have so far:
> 
> library(tidyverse)
>
> data <- read_csv("Ely_MN_Weather.csv")
>
> start_day = yday(as_date("2023-09-22"))
> end_day = yday(as_date("2023-10-15"))
>
> d <- as_tibble(data) %>%
> select(DATE,TMAX,TMIN) %>%
> mutate(DATE = as_date(DATE),
>yday = yday(DATE),
>md = sprintf("%02d-%02d", month(DATE), mday(DATE))
>) %>%
> filter(yday >= start_day & yday <= end_day) %>%
> mutate(md = as.factor(md))
>
> d_sum <- d %>%
> group_by(md) %>%
> summarize(tmax_mean = mean(TMAX, na.rm=TRUE))
>
> ## Here's the filtered data:
> dput(d_sum)
>
> > structure(list(md = structure(1:25, levels = c("09-21", "09-22",
> "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
> "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06",
> "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13",
> "10-14", "10-15"), class = "factor"), tmax_mean = c(65,
> 62.2,
> 61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9,
> 61.2, 63.7, 59.5, 59.6, 61.6,
> 59.4, 58.8, 55.9, 58.125,
> 58, 55.7, 57, 55.4, 49.8,
> 48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame"
> ), row.names = c(NA, -25L))
> >
> ggplot(data = d_sum, aes(x = md)) +
> geom_point(aes(y = tmax_mean, color = "blue")) +
> geom_smooth(aes(y = tmax_mean, color = "blue"))
> =
> My questions are:
> 1. Why isn't my geom_smooth plotting? How can I fix it?
> 2. I don't think I'm handling the month and day combination correctly.
> Is there a way to encode month and day (but not year) as a date?
> 3. (Minor point) Why does my graph of tmax_mean come out red when I
> specify "blue"?
>
> Thanks for any advice or guidance you can offer. I really appreciate
> the expertise of this group.
>
> -Kevin
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with plotting and date-times for climate data

2023-09-13 Thread Richard O'Keefe

can be modeled as a function of
> > > temperature. These are often called growing degree day models (or
> > > some version of that). This is number of thermal units needed for
> > > the organism to develop to the next stage (e.g. instar for an
> > > insect, or fruit/flower formation for a plant). However, better
> > > accuracy is obtained if the model includes both min and max
> > > thresholds.
> > >
> > > All I have done is provide an example where min and max could have
> > > a real world use. I use max(temp) over some interval and then
> > > update an accumulated thermal units variable based on the outcome.
> > > That detail is not evident in the original request.
> > >
> > > Tim
> > >
> > > -Original Message-
> > > From: R-help  On Behalf Of Richard
> > > O'Keefe
> > > Sent: Wednesday, September 13, 2023 9:58 AM
> > > To: Kevin Zembower 
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] Help with plotting and date-times for climate data
> > >
> > > [External Email]
> > >
> > > Off-topic, but what is a "mean temperature max"
> > > and what good would it do you to know you if you did?
> > > I've been looking at a lot of weather station data and for no
> > > question I've ever had (except "would the newspapers get excited
> > > about this") was "max" (or min) the answer.  Considering the way
> > > that temperature can change by several degrees in a few minutes, or
> > > a few metres -- I meant horizontally when I wrote that, but as you
> > > know your head and feet don't experience the same temperature,
> > > again by more than one degree -- I am at something of a loss to
> > > ascribe much practical significance to TMAX.  Are you sure this is
> > > the analysis you want to do?  Is this the most informative data you
> > > can get?
> > >
> > > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help <
> > > r-help@r-project.org> wrote:
> > >
> > > > Hello,
> > > >
> > > > I'm trying to calculate the mean temperature max from a file of
> > > > climate date, and plot it over a range of days in the year. I've
> > > > downloaded the data, and cleaned it up the way I think it should
> > > > be.
> > > > However, when I plot it, the geom_smooth line doesn't show up. I
> > > > think
> > > > that's because my x axis is characters or factors. Here's what I
> > > > have so far:
> > > > 
> > > > library(tidyverse)
> > > >
> > > > data <- read_csv("Ely_MN_Weather.csv")
> > > >
> > > > start_day = yday(as_date("2023-09-22")) end_day =
> > > > yday(as_date("2023-10-15"))
> > > >
> > > > d <- as_tibble(data) %>%
> > > >  select(DATE,TMAX,TMIN) %>%
> > > >  mutate(DATE = as_date(DATE),
> > > > yday = yday(DATE),
> > > > md = sprintf("%02d-%02d", month(DATE), mday(DATE))
> > > > ) %>%
> > > >  filter(yday >= start_day & yday <= end_day) %>%
> > > >  mutate(md = as.factor(md))
> > > >
> > > > d_sum <- d %>%
> > > >  group_by(md) %>%
> > > >  summarize(tmax_mean = mean(TMAX, na.rm=TRUE))
> > > >
> > > > ## Here's the filtered data:
> > > > dput(d_sum)
> > > >
> > > > > structure(list(md = structure(1:25, levels = c("09-21", "09-
> > > > > 22",
> > > > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
> > > > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06",
> > > > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13",
> > > > "10-14", "10-15"), class = "factor"), tmax_mean = c(65,
> > > > 62.2, 61.3, 63.9, 64.3, 60.1,
> > > > 62.3, 60.5, 61.9, 61.2, 63.7, 59.5, 59.6,
> > > > 61.6, 59.4, 58.8,
&

Re: [R] Help with plotting and date-times for climate data

Dear Kevin,

You could try the National Weather Service. I can get "International Falls" and 
other locations, though Ely is not specifically listed. 

h**ps://www.weather.gov/wrh/climate?wfo=dlh
Replace the ** with tt and it should give the right link.

There is a menu.
Select your location,
Select a product (I selected temperature)
Select a year, and period of interest.
Select go.

If you scroll over the figure a popup with numbers appears.

The weather data in R is possible as well.
I would start by filtering the data to remove dates outside my range of 
interest. Then extract the date (say Day). Group_by the day and apply a max 
function to the grouped data. Then plot the result.

Tim

-Original Message-
From: Kevin Zembower  
Sent: Wednesday, September 13, 2023 3:26 PM
To: Ebert,Timothy Aaron ; Richard O'Keefe 
Cc: r-help@r-project.org
Subject: Re: [R] Help with plotting and date-times for climate data

[External Email]

Hi, Tim,

I actually did see this chart when I was doing some research, but rejected it 
because it was difficult to interpolate the graph for the three week period I 
was interested it. I didn't discover until just now that I could click on the 
labels on the x-axis to expand the graph.
Unfortunately, downloading the data from this site costs $95/month.

Also, I found the raw data (from the NWS, for free) and decided to exercise my 
R skills to see if I could produce the exact graph I wanted.

Thanks for taking the time to research this.

-Kevin

On Wed, 2023-09-13 at 18:21 +, Ebert,Timothy Aaron wrote:
> Hi Kevin,
>
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fweat
> herspark.com%2Fy%2F11610%2FAverage-Weather-in-Ely-Minnesota-United-Sta
> tes-Year-Round&data=05%7C01%7Ctebert%40ufl.edu%7C3c23bc8b4af14d747e2f0
> 8dbb48f37af%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C6383022994410
> 38779%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB
> TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=CE%2FYdcJbtKhZZ6VeRlI
> 55gEfwy8m2i1yhO9iUgB%2BkUc%3D&reserved=0
> Just scroll down. I think what you are looking for is the first graph, 
> but there are about a dozen other graphs on various meteorological 
> metrics.
>
>Another option would be to use larger cities (Duluth, 
> International Falls, Thunder Bay) and take a metal average. There is a 
> lake effect for two of these more than the other.
>
>All good?
> Tim
>
> -Original Message-
> From: Kevin Zembower 
> Sent: Wednesday, September 13, 2023 2:05 PM
> To: Ebert,Timothy Aaron ; Richard O'Keefe 
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] Help with plotting and date-times for climate data
>
> [External Email]
>
> Well, I looked for this, on both the NWS and WeatherUnderground, but 
> couldn't find what I was looking for. Didn't check Weather.com, but if 
> you can find a chart of the average high and low temperatures in Ely, 
> MN between about the middle of September to the middle of October, 
> I'll buy you a beer.
>
> -Kevin
>
> On Wed, 2023-09-13 at 17:39 +, Ebert,Timothy Aaron wrote:
> > I admire the dedication to R and data science, but the Weather 
> > Channel might be a simpler approach. Weather.com. I can search for 
> > (city
> > name)
> > and either weather (current values) or climate. It depends on how 
> > far away the trip will be.
> >
> > -Original Message-
> > From: Kevin Zembower 
> > Sent: Wednesday, September 13, 2023 1:22 PM
> > To: Richard O'Keefe ; Ebert,Timothy Aaron 
> > 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Help with plotting and date-times for climate data
> >
> > [External Email]
> >
> > Tim, Richard, y'all are reading too much into this. I believe that 
> > TMAX is the high temperature of the day, and TMIN is the low. I'm 
> > trying to compute the average or median high and low temperatures 
> > for the data I have (2011 to present). I'm going on a trip to this 
> > area, and want to know how to pack.
> >
> > Thanks for your interest.
> >
> > -Kevin
> >
> > On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> > > I am well aware of the physiological implications of temperature, 
> > > and that is *why* I view recorded TMIN and TMAX at a single point 
> > > with an extremely jaundiced eye.  TMAX at shoulder height has very 
> > > little relevance to an insect living in grass, for example.  And 
> > > if TMAX is sustained for one second, that has very different 
> > > consequences from if TMAX is sustained for five minutes.  I can 
> > > see the usefulness of "proportion of day abov

Re: [R] Help with plotting and date-times for climate data

Hi, Tim,

I actually did see this chart when I was doing some research, but
rejected it because it was difficult to interpolate the graph for the
three week period I was interested it. I didn't discover until just now
that I could click on the labels on the x-axis to expand the graph.
Unfortunately, downloading the data from this site costs $95/month. 

Also, I found the raw data (from the NWS, for free) and decided to
exercise my R skills to see if I could produce the exact graph I
wanted.

Thanks for taking the time to research this.

-Kevin

On Wed, 2023-09-13 at 18:21 +, Ebert,Timothy Aaron wrote:
> Hi Kevin,
> 
> https://weatherspark.com/y/11610/Average-Weather-in-Ely-Minnesota-United-States-Year-Round
> Just scroll down. I think what you are looking for is the first
> graph, but there are about a dozen other graphs on various
> meteorological metrics. 
>    
>    Another option would be to use larger cities (Duluth,
> International Falls, Thunder Bay) and take a metal average. There is
> a lake effect for two of these more than the other. 
>    
>    All good?
> Tim
> 
> -Original Message-
> From: Kevin Zembower  
> Sent: Wednesday, September 13, 2023 2:05 PM
> To: Ebert,Timothy Aaron ; Richard O'Keefe
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] Help with plotting and date-times for climate data
> 
> [External Email]
> 
> Well, I looked for this, on both the NWS and WeatherUnderground, but
> couldn't find what I was looking for. Didn't check Weather.com, but
> if you can find a chart of the average high and low temperatures in
> Ely, MN between about the middle of September to the middle of
> October, I'll buy you a beer.
> 
> -Kevin
> 
> On Wed, 2023-09-13 at 17:39 +, Ebert,Timothy Aaron wrote:
> > I admire the dedication to R and data science, but the Weather
> > Channel 
> > might be a simpler approach. Weather.com. I can search for (city
> > name) 
> > and either weather (current values) or climate. It depends on how
> > far 
> > away the trip will be.
> > 
> > -Original Message-----
> > From: Kevin Zembower 
> > Sent: Wednesday, September 13, 2023 1:22 PM
> > To: Richard O'Keefe ; Ebert,Timothy Aaron 
> > 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Help with plotting and date-times for climate data
> > 
> > [External Email]
> > 
> > Tim, Richard, y'all are reading too much into this. I believe that 
> > TMAX is the high temperature of the day, and TMIN is the low. I'm 
> > trying to compute the average or median high and low temperatures
> > for 
> > the data I have (2011 to present). I'm going on a trip to this
> > area, 
> > and want to know how to pack.
> > 
> > Thanks for your interest.
> > 
> > -Kevin
> > 
> > On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> > > I am well aware of the physiological implications of temperature,
> > > and that is *why* I view recorded TMIN and TMAX at a single point
> > > with an extremely jaundiced eye.  TMAX at shoulder height has
> > > very 
> > > little relevance to an insect living in grass, for example.  And
> > > if 
> > > TMAX is sustained for one second, that has very different 
> > > consequences from if TMAX is sustained for five minutes.  I can
> > > see 
> > > the usefulness of "proportion of day above Thi/below Tlo", but
> > > that 
> > > is quite different.
> > > 
> > > OK, so my interest in weather data was mainly based around water
> > > management: precipitation, evaporation, herd and crop water
> > > needs, 
> > > that kind of thing.  And the first thing you learn from that 
> > > experience is that ANY kind of single-point summary is seriously 
> > > misleading.
> > > 
> > > Let's end this digression.
> > > 
> > > 
> > > On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron
> > > 
> > > wrote:
> > > > I had the same question.
> > > > However, I can partly answer the off-topic question. Min and
> > > > max 
> > > > can be important as lower and upper development thresholds.
> > > > Below 
> > > > the min no growth or development occur because reaction rates
> > > > are 
> > > > too slow to enable such. Above max, temperatures are too hot.
> > > > Protein function is impaired, and systems stop functioning.
> > > > There 
> > > > is a considerable range between where systems shut

Re: [R] Help with plotting and date-times for climate data

Hi Kevin,

https://weatherspark.com/y/11610/Average-Weather-in-Ely-Minnesota-United-States-Year-Round
Just scroll down. I think what you are looking for is the first graph, but 
there are about a dozen other graphs on various meteorological metrics. 
   
   Another option would be to use larger cities (Duluth, International 
Falls, Thunder Bay) and take a metal average. There is a lake effect for two of 
these more than the other. 
   
   All good?
Tim

-Original Message-
From: Kevin Zembower  
Sent: Wednesday, September 13, 2023 2:05 PM
To: Ebert,Timothy Aaron ; Richard O'Keefe 
Cc: r-help@r-project.org
Subject: Re: [R] Help with plotting and date-times for climate data

[External Email]

Well, I looked for this, on both the NWS and WeatherUnderground, but couldn't 
find what I was looking for. Didn't check Weather.com, but if you can find a 
chart of the average high and low temperatures in Ely, MN between about the 
middle of September to the middle of October, I'll buy you a beer.

-Kevin

On Wed, 2023-09-13 at 17:39 +, Ebert,Timothy Aaron wrote:
> I admire the dedication to R and data science, but the Weather Channel 
> might be a simpler approach. Weather.com. I can search for (city name) 
> and either weather (current values) or climate. It depends on how far 
> away the trip will be.
>
> -Original Message-
> From: Kevin Zembower 
> Sent: Wednesday, September 13, 2023 1:22 PM
> To: Richard O'Keefe ; Ebert,Timothy Aaron 
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] Help with plotting and date-times for climate data
>
> [External Email]
>
> Tim, Richard, y'all are reading too much into this. I believe that 
> TMAX is the high temperature of the day, and TMIN is the low. I'm 
> trying to compute the average or median high and low temperatures for 
> the data I have (2011 to present). I'm going on a trip to this area, 
> and want to know how to pack.
>
> Thanks for your interest.
>
> -Kevin
>
> On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> > I am well aware of the physiological implications of temperature, 
> > and that is *why* I view recorded TMIN and TMAX at a single point 
> > with an extremely jaundiced eye.  TMAX at shoulder height has very 
> > little relevance to an insect living in grass, for example.  And if 
> > TMAX is sustained for one second, that has very different 
> > consequences from if TMAX is sustained for five minutes.  I can see 
> > the usefulness of "proportion of day above Thi/below Tlo", but that 
> > is quite different.
> >
> > OK, so my interest in weather data was mainly based around water
> > management: precipitation, evaporation, herd and crop water needs, 
> > that kind of thing.  And the first thing you learn from that 
> > experience is that ANY kind of single-point summary is seriously 
> > misleading.
> >
> > Let's end this digression.
> >
> >
> > On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron 
> > wrote:
> > > I had the same question.
> > > However, I can partly answer the off-topic question. Min and max 
> > > can be important as lower and upper development thresholds. Below 
> > > the min no growth or development occur because reaction rates are 
> > > too slow to enable such. Above max, temperatures are too hot.
> > > Protein function is impaired, and systems stop functioning. There 
> > > is a considerable range between where systems shut down (but
> > > recover) and tissue death.
> > > In a simple form the growth and physiological stage of plants, 
> > > insects, and many others, can be modeled as a function of 
> > > temperature. These are often called growing degree day models (or 
> > > some version of that). This is number of thermal units needed for 
> > > the organism to develop to the next stage (e.g. instar for an 
> > > insect, or fruit/flower formation for a plant). However, better 
> > > accuracy is obtained if the model includes both min and max 
> > > thresholds.
> > >
> > > All I have done is provide an example where min and max could have 
> > > a real world use. I use max(temp) over some interval and then 
> > > update an accumulated thermal units variable based on the outcome.
> > > That detail is not evident in the original request.
> > >
> > > Tim
> > >
> > > -Original Message-
> > > From: R-help  On Behalf Of Richard
> > > O'Keefe
> > > Sent: Wednesday, September 13, 2023 9:58 AM
> > > To: Kevin Zembower 
> > > Cc: r-help@r-project.org
> > >

Re: [R] Help with plotting and date-times for climate data

Well, I looked for this, on both the NWS and WeatherUnderground, but
couldn't find what I was looking for. Didn't check Weather.com, but if
you can find a chart of the average high and low temperatures in Ely,
MN between about the middle of September to the middle of October, I'll
buy you a beer.

-Kevin

On Wed, 2023-09-13 at 17:39 +, Ebert,Timothy Aaron wrote:
> I admire the dedication to R and data science, but the Weather
> Channel might be a simpler approach. Weather.com. I can search for
> (city name) and either weather (current values) or climate. It
> depends on how far away the trip will be.
> 
> -Original Message-
> From: Kevin Zembower  
> Sent: Wednesday, September 13, 2023 1:22 PM
> To: Richard O'Keefe ; Ebert,Timothy Aaron
> 
> Cc: r-help@r-project.org
> Subject: Re: [R] Help with plotting and date-times for climate data
> 
> [External Email]
> 
> Tim, Richard, y'all are reading too much into this. I believe that
> TMAX is the high temperature of the day, and TMIN is the low. I'm
> trying to compute the average or median high and low temperatures for
> the data I have (2011 to present). I'm going on a trip to this area,
> and want to know how to pack.
> 
> Thanks for your interest.
> 
> -Kevin
> 
> On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> > I am well aware of the physiological implications of temperature,
> > and 
> > that is *why* I view recorded TMIN and TMAX at a single point with
> > an 
> > extremely jaundiced eye.  TMAX at shoulder height has very little 
> > relevance to an insect living in grass, for example.  And if TMAX
> > is 
> > sustained for one second, that has very different consequences from
> > if 
> > TMAX is sustained for five minutes.  I can see the usefulness of 
> > "proportion of day above Thi/below Tlo", but that is quite
> > different.
> > 
> > OK, so my interest in weather data was mainly based around water 
> > management: precipitation, evaporation, herd and crop water needs, 
> > that kind of thing.  And the first thing you learn from that 
> > experience is that ANY kind of single-point summary is seriously 
> > misleading.
> > 
> > Let's end this digression.
> > 
> > 
> > On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron 
> > wrote:
> > > I had the same question.
> > > However, I can partly answer the off-topic question. Min and max
> > > can 
> > > be important as lower and upper development thresholds. Below the
> > > min no growth or development occur because reaction rates are too
> > > slow to enable such. Above max, temperatures are too hot.
> > > Protein function is impaired, and systems stop functioning. There
> > > is 
> > > a considerable range between where systems shut down (but
> > > recover) and tissue death.
> > > In a simple form the growth and physiological stage of plants, 
> > > insects, and many others, can be modeled as a function of 
> > > temperature. These are often called growing degree day models (or
> > > some version of that). This is number of thermal units needed for
> > > the organism to develop to the next stage (e.g. instar for an 
> > > insect, or fruit/flower formation for a plant). However, better 
> > > accuracy is obtained if the model includes both min and max 
> > > thresholds.
> > > 
> > > All I have done is provide an example where min and max could
> > > have a 
> > > real world use. I use max(temp) over some interval and then
> > > update 
> > > an accumulated thermal units variable based on the outcome.
> > > That detail is not evident in the original request.
> > > 
> > > Tim
> > > 
> > > -Original Message-
> > > From: R-help  On Behalf Of Richard 
> > > O'Keefe
> > > Sent: Wednesday, September 13, 2023 9:58 AM
> > > To: Kevin Zembower 
> > > Cc: r-help@r-project.org
> > > Subject: Re: [R] Help with plotting and date-times for climate
> > > data
> > > 
> > > [External Email]
> > > 
> > > Off-topic, but what is a "mean temperature max"
> > > and what good would it do you to know you if you did?
> > > I've been looking at a lot of weather station data and for no 
> > > question I've ever had (except "would the newspapers get excited 
> > > about this") was "max" (or min) the answer.  Considering the way 
> > > that temperature can change by several degrees in a few

Re: [R] Help with plotting and date-times for climate data

I admire the dedication to R and data science, but the Weather Channel might be 
a simpler approach. Weather.com. I can search for (city name) and either 
weather (current values) or climate. It depends on how far away the trip will 
be.

-Original Message-
From: Kevin Zembower  
Sent: Wednesday, September 13, 2023 1:22 PM
To: Richard O'Keefe ; Ebert,Timothy Aaron 
Cc: r-help@r-project.org
Subject: Re: [R] Help with plotting and date-times for climate data

[External Email]

Tim, Richard, y'all are reading too much into this. I believe that TMAX is the 
high temperature of the day, and TMIN is the low. I'm trying to compute the 
average or median high and low temperatures for the data I have (2011 to 
present). I'm going on a trip to this area, and want to know how to pack.

Thanks for your interest.

-Kevin

On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> I am well aware of the physiological implications of temperature, and 
> that is *why* I view recorded TMIN and TMAX at a single point with an 
> extremely jaundiced eye.  TMAX at shoulder height has very little 
> relevance to an insect living in grass, for example.  And if TMAX is 
> sustained for one second, that has very different consequences from if 
> TMAX is sustained for five minutes.  I can see the usefulness of 
> "proportion of day above Thi/below Tlo", but that is quite different.
>
> OK, so my interest in weather data was mainly based around water 
> management: precipitation, evaporation, herd and crop water needs, 
> that kind of thing.  And the first thing you learn from that 
> experience is that ANY kind of single-point summary is seriously 
> misleading.
>
> Let's end this digression.
>
>
> On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron 
> wrote:
> > I had the same question.
> > However, I can partly answer the off-topic question. Min and max can 
> > be important as lower and upper development thresholds. Below the 
> > min no growth or development occur because reaction rates are too 
> > slow to enable such. Above max, temperatures are too hot.
> > Protein function is impaired, and systems stop functioning. There is 
> > a considerable range between where systems shut down (but
> > recover) and tissue death.
> > In a simple form the growth and physiological stage of plants, 
> > insects, and many others, can be modeled as a function of 
> > temperature. These are often called growing degree day models (or 
> > some version of that). This is number of thermal units needed for 
> > the organism to develop to the next stage (e.g. instar for an 
> > insect, or fruit/flower formation for a plant). However, better 
> > accuracy is obtained if the model includes both min and max 
> > thresholds.
> >
> > All I have done is provide an example where min and max could have a 
> > real world use. I use max(temp) over some interval and then update 
> > an accumulated thermal units variable based on the outcome.
> > That detail is not evident in the original request.
> >
> > Tim
> >
> > -Original Message-
> > From: R-help  On Behalf Of Richard 
> > O'Keefe
> > Sent: Wednesday, September 13, 2023 9:58 AM
> > To: Kevin Zembower 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Help with plotting and date-times for climate data
> >
> > [External Email]
> >
> > Off-topic, but what is a "mean temperature max"
> > and what good would it do you to know you if you did?
> > I've been looking at a lot of weather station data and for no 
> > question I've ever had (except "would the newspapers get excited 
> > about this") was "max" (or min) the answer.  Considering the way 
> > that temperature can change by several degrees in a few minutes, or 
> > a few metres -- I meant horizontally when I wrote that, but as you 
> > know your head and feet don't experience the same temperature, again 
> > by more than one degree -- I am at something of a loss to ascribe 
> > much practical significance to TMAX.  Are you sure this is the 
> > analysis you want to do?  Is this the most informative data you can 
> > get?
> >
> > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help < 
> > r-help@r-project.org> wrote:
> >
> > > Hello,
> > >
> > > I'm trying to calculate the mean temperature max from a file of 
> > > climate date, and plot it over a range of days in the year. I've 
> > > downloaded the data, and cleaned it up the way I think it should 
> > > be.
> > > However, when I plot it, the geom_smooth line doe

Re: [R] Help with plotting and date-times for climate data

Rui, thanks so much for your clear explanation, solution to my problem,
and additional help with making the graph come out exactly as I was
hoping. I learned a lot from your solution. Thanks, again, for your
help.

-Kevin

On Tue, 2023-09-12 at 23:06 +0100, Rui Barradas wrote:
> Às 21:50 de 12/09/2023, Kevin Zembower via R-help escreveu:
> > Hello,
> > 
> > I'm trying to calculate the mean temperature max from a file of
> > climate
> > date, and plot it over a range of days in the year. I've downloaded
> > the
> > data, and cleaned it up the way I think it should be. However, when
> > I
> > plot it, the geom_smooth line doesn't show up. I think that's
> > because
> > my x axis is characters or factors. Here's what I have so far:
> > 
> > library(tidyverse)
> > 
> > data <- read_csv("Ely_MN_Weather.csv")
> > 
> > start_day = yday(as_date("2023-09-22"))
> > end_day = yday(as_date("2023-10-15"))
> >     
> > d <- as_tibble(data) %>%
> >  select(DATE,TMAX,TMIN) %>%
> >  mutate(DATE = as_date(DATE),
> >     yday = yday(DATE),
> >     md = sprintf("%02d-%02d", month(DATE), mday(DATE))
> >     ) %>%
> >  filter(yday >= start_day & yday <= end_day) %>%
> >  mutate(md = as.factor(md))
> > 
> > d_sum <- d %>%
> >  group_by(md) %>%
> >  summarize(tmax_mean = mean(TMAX, na.rm=TRUE))
> > 
> > ## Here's the filtered data:
> > dput(d_sum)
> > 
> > > structure(list(md = structure(1:25, levels = c("09-21", "09-22",
> > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
> > "09-30", "10-01", "10-02", "10-03", "10-04", "10-05", "10-06",
> > "10-07", "10-08", "10-09", "10-10", "10-11", "10-12", "10-13",
> > "10-14", "10-15"), class = "factor"), tmax_mean = c(65,
> > 62.2,
> > 61.3, 63.9, 64.3, 60.1, 62.3, 60.5, 61.9,
> > 61.2, 63.7, 59.5, 59.6, 61.6,
> > 59.4, 58.8, 55.9, 58.125,
> > 58, 55.7, 57, 55.4, 49.8,
> > 48.75, 43.7)), class = c("tbl_df", "tbl", "data.frame"
> > ), row.names = c(NA, -25L))
> > > 
> > ggplot(data = d_sum, aes(x = md)) +
> >  geom_point(aes(y = tmax_mean, color = "blue")) +
> >  geom_smooth(aes(y = tmax_mean, color = "blue"))
> > =
> > My questions are:
> > 1. Why isn't my geom_smooth plotting? How can I fix it?
> > 2. I don't think I'm handling the month and day combination
> > correctly.
> > Is there a way to encode month and day (but not year) as a date?
> > 3. (Minor point) Why does my graph of tmax_mean come out red when I
> > specify "blue"?
> > 
> > Thanks for any advice or guidance you can offer. I really
> > appreciate
> > the expertise of this group.
> > 
> > -Kevin
> > 
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> Hello,
> 
> The problem is that the dates are factors, not real dates. And 
> geom_smooth is not interpolating along a discrete axis (the x axis).
> 
> Paste a fake year with md, coerce to date and plot.
> I have simplified the aes() calls and added a date scale in order to 
> make the x axis more readable.
> 
> Without the formula and method arguments, geom_smooth will print a 
> message, they are now made explicit.
> 
> 
> 
> suppressPackageStartupMessages({
>    library(dplyr)
>    library(ggplot2)
> })
> 
> d_sum %>%
>    mutate(md = paste("2023", md, sep = "-"),
>   md = as.Date(md)) %>%
>    ggplot(aes(x = md, y = tmax_mean)) +
>    geom_point(color = "blue") +
>    geom_smooth(
>  formula = y ~ x,
>  method = loess,
>  color = "blue"
>    ) +
>    scale_x_date(date_breaks = "7 days", date_labels = "%m-%d")
> 
> 
> 
> Hope this helps,
> 
> Rui Barradas
> 



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help with plotting and date-times for climate data

Tim, Richard, y'all are reading too much into this. I believe that TMAX
is the high temperature of the day, and TMIN is the low. I'm trying to
compute the average or median high and low temperatures for the data I
have (2011 to present). I'm going on a trip to this area, and want to
know how to pack.

Thanks for your interest.

-Kevin

On Thu, 2023-09-14 at 03:07 +1200, Richard O'Keefe wrote:
> I am well aware of the physiological implications
> of temperature, and that is *why* I view recorded
> TMIN and TMAX at a single point with an extremely
> jaundiced eye.  TMAX at shoulder height has very
> little relevance to an insect living in grass, for
> example.  And if TMAX is sustained for one second,
> that has very different consequences from if TMAX
> is sustained for five minutes.  I can see the usefulness
> of "proportion of day above Thi/below Tlo", but that
> is quite different.
> 
> OK, so my interest in weather data was mainly based
> around water management: precipitation, evaporation,
> herd and crop water needs, that kind of thing.  And
> the first thing you learn from that experience is
> that ANY kind of single-point summary is seriously
> misleading.
> 
> Let's end this digression.
> 
> 
> On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron 
> wrote:
> > I had the same question.
> > However, I can partly answer the off-topic question. Min and max
> > can be important as lower and upper development thresholds. Below
> > the min no growth or development occur because reaction rates are
> > too slow to enable such. Above max, temperatures are too hot.
> > Protein function is impaired, and systems stop functioning. There
> > is a considerable range between where systems shut down (but
> > recover) and tissue death.
> > In a simple form the growth and physiological stage of plants,
> > insects, and many others, can be modeled as a function of
> > temperature. These are often called growing degree day models (or
> > some version of that). This is number of thermal units needed for
> > the organism to develop to the next stage (e.g. instar for an
> > insect, or fruit/flower formation for a plant). However, better
> > accuracy is obtained if the model includes both min and max
> > thresholds.
> > 
> > All I have done is provide an example where min and max could have
> > a real world use. I use max(temp) over some interval and then
> > update an accumulated thermal units variable based on the outcome.
> > That detail is not evident in the original request.
> > 
> > Tim
> > 
> > -Original Message-
> > From: R-help  On Behalf Of Richard
> > O'Keefe
> > Sent: Wednesday, September 13, 2023 9:58 AM
> > To: Kevin Zembower 
> > Cc: r-help@r-project.org
> > Subject: Re: [R] Help with plotting and date-times for climate data
> > 
> > [External Email]
> > 
> > Off-topic, but what is a "mean temperature max"
> > and what good would it do you to know you if you did?
> > I've been looking at a lot of weather station data and for no
> > question I've ever had (except "would the newspapers get excited
> > about this") was "max" (or min) the answer.  Considering the way
> > that temperature can change by several degrees in a few minutes, or
> > a few metres -- I meant horizontally when I wrote that, but as you
> > know your head and feet don't experience the same temperature,
> > again by more than one degree -- I am at something of a loss to
> > ascribe much practical significance to TMAX.  Are you sure this is
> > the analysis you want to do?  Is this the most informative data you
> > can get?
> > 
> > On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help <
> > r-help@r-project.org> wrote:
> > 
> > > Hello,
> > > 
> > > I'm trying to calculate the mean temperature max from a file of
> > > climate date, and plot it over a range of days in the year. I've
> > > downloaded the data, and cleaned it up the way I think it should
> > > be.
> > > However, when I plot it, the geom_smooth line doesn't show up. I
> > > think
> > > that's because my x axis is characters or factors. Here's what I
> > > have so far:
> > > 
> > > library(tidyverse)
> > > 
> > > data <- read_csv("Ely_MN_Weather.csv")
> > > 
> > > start_day = yday(as_date("2023-09-22")) end_day =
> > > yday(as_date("2023-10-15"))
> > > 
> >

Re: [R] Help with plotting and date-times for climate data

2023-09-13 Thread Richard O'Keefe

I am well aware of the physiological implications
of temperature, and that is *why* I view recorded
TMIN and TMAX at a single point with an extremely
jaundiced eye.  TMAX at shoulder height has very
little relevance to an insect living in grass, for
example.  And if TMAX is sustained for one second,
that has very different consequences from if TMAX
is sustained for five minutes.  I can see the usefulness
of "proportion of day above Thi/below Tlo", but that
is quite different.

OK, so my interest in weather data was mainly based
around water management: precipitation, evaporation,
herd and crop water needs, that kind of thing.  And
the first thing you learn from that experience is
that ANY kind of single-point summary is seriously
misleading.

Let's end this digression.


On Thu, 14 Sept 2023 at 02:18, Ebert,Timothy Aaron  wrote:

> I had the same question.
> However, I can partly answer the off-topic question. Min and max can be
> important as lower and upper development thresholds. Below the min no
> growth or development occur because reaction rates are too slow to enable
> such. Above max, temperatures are too hot. Protein function is impaired,
> and systems stop functioning. There is a considerable range between where
> systems shut down (but recover) and tissue death.
> In a simple form the growth and physiological stage of plants, insects,
> and many others, can be modeled as a function of temperature. These are
> often called growing degree day models (or some version of that). This is
> number of thermal units needed for the organism to develop to the next
> stage (e.g. instar for an insect, or fruit/flower formation for a plant).
> However, better accuracy is obtained if the model includes both min and max
> thresholds.
>
> All I have done is provide an example where min and max could have a real
> world use. I use max(temp) over some interval and then update an
> accumulated thermal units variable based on the outcome. That detail is not
> evident in the original request.
>
> Tim
>
> -Original Message-
> From: R-help  On Behalf Of Richard O'Keefe
> Sent: Wednesday, September 13, 2023 9:58 AM
> To: Kevin Zembower 
> Cc: r-help@r-project.org
> Subject: Re: [R] Help with plotting and date-times for climate data
>
> [External Email]
>
> Off-topic, but what is a "mean temperature max"
> and what good would it do you to know you if you did?
> I've been looking at a lot of weather station data and for no question
> I've ever had (except "would the newspapers get excited about this") was
> "max" (or min) the answer.  Considering the way that temperature can change
> by several degrees in a few minutes, or a few metres -- I meant
> horizontally when I wrote that, but as you know your head and feet don't
> experience the same temperature, again by more than one degree -- I am at
> something of a loss to ascribe much practical significance to TMAX.  Are
> you sure this is the analysis you want to do?  Is this the most informative
> data you can get?
>
> On Wed, 13 Sept 2023 at 08:51, Kevin Zembower via R-help <
> r-help@r-project.org> wrote:
>
> > Hello,
> >
> > I'm trying to calculate the mean temperature max from a file of
> > climate date, and plot it over a range of days in the year. I've
> > downloaded the data, and cleaned it up the way I think it should be.
> > However, when I plot it, the geom_smooth line doesn't show up. I think
> > that's because my x axis is characters or factors. Here's what I have so
> far:
> > 
> > library(tidyverse)
> >
> > data <- read_csv("Ely_MN_Weather.csv")
> >
> > start_day = yday(as_date("2023-09-22")) end_day =
> > yday(as_date("2023-10-15"))
> >
> > d <- as_tibble(data) %>%
> > select(DATE,TMAX,TMIN) %>%
> > mutate(DATE = as_date(DATE),
> >yday = yday(DATE),
> >md = sprintf("%02d-%02d", month(DATE), mday(DATE))
> >) %>%
> > filter(yday >= start_day & yday <= end_day) %>%
> > mutate(md = as.factor(md))
> >
> > d_sum <- d %>%
> > group_by(md) %>%
> > summarize(tmax_mean = mean(TMAX, na.rm=TRUE))
> >
> > ## Here's the filtered data:
> > dput(d_sum)
> >
> > > structure(list(md = structure(1:25, levels = c("09-21", "09-22",
> > "09-23", "09-24", "09-25", "09-26", "09-27", "09-28", "09-29",
> > "09-30", "10-01", "10-02", "10-03

Re: [R] Help with plotting and date-times for climate data