from:"Ista Zahn"

Re: [Rd] Unexpected argument-matching when some are missing

2018-11-29 Thread Ista Zahn

On Thu, Nov 29, 2018 at 1:10 PM S Ellison  wrote:
>
>
> > > plot(x=1:10, y=)
> > > plot(x=1:10, y=, 10:1)
> > >
> > > In both cases, 'y=' is ignored. In the first, the plot is for y=NULL (so 
> > > not
> > 'missing' y)
> > > In the second case, 10:1 is positionally matched to y despite the 
> > > intervening
> > 'missing' 'y='
> > >
> > > So it isn't just 'missing'; it's 'not there at all'
> >
> > What exactly is the difference between "missing" and "not there at all"?
>
> A "missing argument" in R means that an argument with no default value was 
> omitted from the call, and that is what I meant by "missing".
> But that is not what is happening here. I was talking about "y=" apparently 
> being treated as not present in the call, rather than the argument y being 
> treated as a missing argument.
>
> In these examples, plot.default has a default value for y (NULL) so y can 
> never be "missing" in the sense of the 'missing argument' error (compare what 
> happens with plot(y=1:10), which reports x as 'missing').
> In the first example, y was (from the plot behaviour) taken as NULL - the 
> default - so was not considered a missing argument. In the second, it was 
> taken as 10:1 - again, non-missing, despite 10:1 being in the normal position 
> for the (character) argument "type".
> But neither call did anything at all with "y=". Instead, the behaviour is 
> consistent with what would have happened if 'y=' were "not present at all" 
> when counting position or named argument list, rather than if 'y' were an 
> absent required argument.
> It _looks_ as if the initial call parsing silently ignored the malformed 
> expression "y=" before any argument matching - positional or by name - takes 
> place.

Yes, I think all of that is correct. But y _is_ missing in this sense:

> debug(plot)
> plot(1:10, y=)
debugging in: plot(1:10, y = )
debug: UseMethod("plot")
Browse[2]> missing(y)
[1] TRUE

though this does not explain the behavior since

> plot( , , "l")
debugging in: plot(, , "l")
debug: UseMethod("plot")
Browse[2]> missing(y)
[1] TRUE

--Ista
>
> But I'm thinking that it'll take an R-core guru to explain what's going on 
> here, so I was going to wait and see.
>
> Steve Ellison
>
>
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:8}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unexpected argument-matching when some are missing

2018-11-29 Thread Ista Zahn

On Thu, Nov 29, 2018 at 10:51 AM S Ellison  wrote:
>
> > When trying out some variations with `[.data.frame` I noticed some (to me)
> > odd behaviour,
>
> Not just in 'myfun' ...
>
> plot(x=1:10, y=)
> plot(x=1:10, y=, 10:1)
>
> In both cases, 'y=' is ignored. In the first, the plot is for y=NULL (so not 
> 'missing' y)
> In the second case, 10:1 is positionally matched to y despite the intervening 
> 'missing' 'y='
>
> So it isn't just 'missing'; it's 'not there at all'

What exactly is the difference between "missing" and "not there at all"?

--Ista

>
> Steve E
>
> > -Original Message-
> > From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Emil
> > Bode
> > Sent: 29 November 2018 10:09
> > To: r-devel@r-project.org
> > Subject: [Rd] Unexpected argument-matching when some are missing
> >
> > When trying out some variations with `[.data.frame` I noticed some (to me)
> > odd behaviour, which I found out has nothing to do with `[.data.frame`, but
> > rather with the way arguments are matched, when mixing named/unnamed
> > and missing/non-missing arguments. Consider the following example:
> >
> >
> >
> > myfun <- function(x,y,z) {
> >
> >   print(match.call())
> >
> >   cat('x=',if(missing(x)) 'missing' else x, '\n')
> >
> >   cat('y=',if(missing(y)) 'missing' else y, '\n')
> >
> >   cat('z=',if(missing(z)) 'missing' else z, '\n')
> >
> > }
> >
> > myfun(x=, y=, "z's value")
> >
> >
> >
> > gives:
> >
> >
> >
> > # myfun(x = "z's value")
> >
> > # x= z's value
> >
> > # y= missing
> >
> > # z= missing
> >
> >
> >
> > This seems very counterintuitive to me, I expect the arguments x and y to be
> > missing, and z to get “z’s value”.
> >
> > When I call myfun(,y=,"z's value"), x is missing, and y gets “z’s value”.
> >
> > Are my expectations wrong or is this a bug? And if my expectations are
> > wrong, where can I find more information on argument-matching?
> >
> > My gut-feeling says to call this a bug, but then I’m surprised no-one else 
> > has
> > encountered it before.
> >
> >
> >
> > And I don’t have multiple installations to work from, so could somebody else
> > confirm this (if it’s not my expectations that are wrong) for R-devel/other 
> > R-
> > versions/other platforms?
> >
> > My setup: R 3.5.1, MacOS 10.13.6, both Rstudio 1.1.453 and R --vanilla from
> > Bash
> >
> >
> >
> > Best regards,
> >
> > Emil Bode
>
>
>
> ***
> This email and any attachments are confidential. Any u...{{dropped:11}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unexpected argument-matching when some are missing

2018-11-29 Thread Ista Zahn

On Thu, Nov 29, 2018 at 5:09 AM Emil Bode  wrote:
>
> When trying out some variations with `[.data.frame` I noticed some (to me) 
> odd behaviour, which I found out has nothing to do with `[.data.frame`, but 
> rather with the way arguments are matched, when mixing named/unnamed and 
> missing/non-missing arguments. Consider the following example:
>
>
>
> myfun <- function(x,y,z) {
>
>   print(match.call())
>
>   cat('x=',if(missing(x)) 'missing' else x, '\n')
>
>   cat('y=',if(missing(y)) 'missing' else y, '\n')
>
>   cat('z=',if(missing(z)) 'missing' else z, '\n')
>
> }
>
> myfun(x=, y=, "z's value")
>
>
>
> gives:
>
>
>
> # myfun(x = "z's value")
>
> # x= z's value
>
> # y= missing
>
> # z= missing
>
>
>
> This seems very counterintuitive to me, I expect the arguments x and y to be 
> missing, and z to get “z’s value”.

Interesting. I would expect it to throw an error, since "x=" is not
syntactically complete. What does "x=" mean anyway? It looks like R
interprets it as "x was not set to anything, i.e., is missing". That
seems reasonable, though I think the example itself is pathological
and would prefer that it produced an error.

--Ista
>
> When I call myfun(,y=,"z's value"), x is missing, and y gets “z’s value”.
>
> Are my expectations wrong or is this a bug? And if my expectations are wrong, 
> where can I find more information on argument-matching?
>
> My gut-feeling says to call this a bug, but then I’m surprised no-one else 
> has encountered it before.
>
>
>
> And I don’t have multiple installations to work from, so could somebody else 
> confirm this (if it’s not my expectations that are wrong) for R-devel/other 
> R-versions/other platforms?
>
> My setup: R 3.5.1, MacOS 10.13.6, both Rstudio 1.1.453 and R --vanilla from 
> Bash
>
>
>
> Best regards,
>
> Emil Bode
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] build package with unicode (farsi) strings

2018-08-30 Thread Ista Zahn

On Thu, Aug 30, 2018 at 3:11 AM Thierry Onkelinx
 wrote:
>
> Dear Farid,
>
> Try using the ASCII notation. letters_fa <- c("\u0627", "\u0641").

... as recommend in the manual:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Encoding-issues

Best,
Ista

The full
> code table is available at https://www.utf8-chartable.de
>
> Best regards,
>
>
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkel...@inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///
>
> 
>
> 2018-08-28 7:17 GMT+02:00 Faridedin Cheraghi :
>
> > Hi,
> >
> > I have a R script file with Persian letters in it defined as a variable:
> >
> > #' @export
> > letters_fa <- c('الف','ب','پ','ت','ث','ج','چ','ح','خ','ر','ز','د')
> >
> > I have specified the encoding field in my DESCRIPTION file of my package.
> >
> > ...
> > Encoding: UTF-8
> > ...
> >
> > I also included Sys.setlocale(locale="Persian") in my .RProfile, so it is
> > executed when RCMD is called. However, after a BUILD and INSTALL, when I
> > access the variable from the package, the characters are not printed
> > correctly:
> > > futils::letters_fa
> >  [1] "<84><81>" "" ""
> >"" ""
> >  [6] "" "<86>" ""
> >"" ""
> > [11] "" ""
> >
> >
> > thanks
> > Farid
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] oddity in transform

2018-07-24 Thread Ista Zahn

On Tue, Jul 24, 2018 at 11:41 AM, Ista Zahn  wrote:
> I don't think it has much to do with transform in particular:
>
>> BOD <- data.frame(Time = 1:6, demand = runif(6))
>> BOD[["X"]] <- BOD[1:2] * seq(6); BOD
>   Timedemand X.Time  X.demand
> 11 0.8649628  1 0.8649628
> 22 0.5895380  4 1.1790761
> 33 0.6854635  9 2.0563906
> 44 0.4255801 16 1.7023206
> 55 0.5738793 25 2.8693967
> 66 0.9996713 36 5.9980281
>> BOD <- data.frame(Time = 1:6, demand = runif(6))
>> BOD[["X"]] <- BOD[1] * seq(6); BOD
>   Time demand Time
> 11 0.729902311
> 22 0.617214224
> 33 0.023891609
> 44 0.28341746   16
> 55 0.06116124   25
> 66 0.67966577   36

Ugh, well, I see now that

BOD[["X"]] <- BOD[1:2] * seq(6); BOD

and

transform(BOD, X = BOD[1:2] * seq(6))

don't produce the same thing, despite printing in ways that look
similar. However,

data.frame(BOD, X = BOD[1:2] * seq(6))

and

data.frame(BOD, X = BOD[1] * seq(6))

do produce the same result as transform, so the point about this being
much more pervasive still holds.

--Ista



>
> --Ista
>
>
> On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck
>  wrote:
>> The idea is that one wants to write the line of code below
>>  in a general way which works the same
>> whether you specify ix as one column or multiple columns but the naming 
>> entirely
>> changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or
>> other hard coding solutions still require writing multiple cases.
>>
>> ix <- 1:2
>> transform(BOD, X = BOD[ix] * seq(6))
>>
>>
>>
>> On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode  wrote:
>>> I think you meant to call BOD[,1]
>>> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] 
>>> is still a data.frame (with one column). So I don't think it's surprising 
>>> transform gets confused by which name to use (X, or Time?), and kind of 
>>> compromises on the name "Time". It's also in a note in ?transform: "If some 
>>> of the values are not vectors of the appropriate length, you deserve 
>>> whatever you get!"
>>> And if you want to do it with multiple extra columns (and are not satisfied 
>>> with these labels), I think the proper way to go would be " transform(BOD, 
>>> X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
>>>
>>> If you want to trace it back further, it's not in transform but in 
>>> data.frame. Column-names are prepended with a higher-level name if the 
>>> object has more than one column.
>>> And it uses the tag-name if simply supplied with a vector:
>>> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of 
>>> BOD[1], Time. Only because that column name is already present, it's 
>>> changed to Time.1
>>> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is 
>>> now a vector)
>>> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns 
>>> names X.Time and X.demand, to show these (multiple) columns are coming from 
>>> X
>>>
>>> So I don't think there's much to fix here. I this case having X.Time in all 
>>> cases would have been better, but in general the column-naming of 
>>> data.frame works, changing it would likely cause a lot of problems.
>>> You can always change the column-names later.
>>>
>>> Best regards,
>>> Emil Bode
>>>
>>> Data-analyst
>>>
>>> +31 6 43 83 89 33
>>> emil.b...@dans.knaw.nl
>>>
>>> DANS: Netherlands Institute for Permanent Access to Digital Research 
>>> Resources
>>> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
>>> i...@dans.knaw.nl <mailto:i...@dans.kn> | dans.knaw.nl 
>>> 
>>> DANS is an institute of the Dutch Academy KNAW <http://knaw.nl/nl> and 
>>> funding organisation NWO <http://www.nwo.nl/>.
>>>
>>> On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" 
>>>  wrote:
>>>
>>> Note the inconsistency in the names in these two examples.  X.Time in
>>> the first case and Time.1 in the second case.
>>>
>>>   > transform(BOD, X = BOD[1:2] * seq(6))
>>> Time demand X.Time X.demand
>>>   118.3  1  8.3
>>>   22   10.3  4 20.6
>>>   33   19.0  9 57.0
>>>   44   16.0 16

Re: [Rd] oddity in transform

2018-07-24 Thread Ista Zahn

I don't think it has much to do with transform in particular:

> BOD <- data.frame(Time = 1:6, demand = runif(6))
> BOD[["X"]] <- BOD[1:2] * seq(6); BOD
  Timedemand X.Time  X.demand
11 0.8649628  1 0.8649628
22 0.5895380  4 1.1790761
33 0.6854635  9 2.0563906
44 0.4255801 16 1.7023206
55 0.5738793 25 2.8693967
66 0.9996713 36 5.9980281
> BOD <- data.frame(Time = 1:6, demand = runif(6))
> BOD[["X"]] <- BOD[1] * seq(6); BOD
  Time demand Time
11 0.729902311
22 0.617214224
33 0.023891609
44 0.28341746   16
55 0.06116124   25
66 0.67966577   36

--Ista


On Tue, Jul 24, 2018 at 7:59 AM, Gabor Grothendieck
 wrote:
> The idea is that one wants to write the line of code below
>  in a general way which works the same
> whether you specify ix as one column or multiple columns but the naming 
> entirely
> changes when you do this and BOD[, 1] and transform(BOD, X=..., Y=...) or
> other hard coding solutions still require writing multiple cases.
>
> ix <- 1:2
> transform(BOD, X = BOD[ix] * seq(6))
>
>
>
> On Tue, Jul 24, 2018 at 7:14 AM, Emil Bode  wrote:
>> I think you meant to call BOD[,1]
>> From ?transform, the ... arguments are supposed to be vectors, and BOD[1] is 
>> still a data.frame (with one column). So I don't think it's surprising 
>> transform gets confused by which name to use (X, or Time?), and kind of 
>> compromises on the name "Time". It's also in a note in ?transform: "If some 
>> of the values are not vectors of the appropriate length, you deserve 
>> whatever you get!"
>> And if you want to do it with multiple extra columns (and are not satisfied 
>> with these labels), I think the proper way to go would be " transform(BOD, 
>> X=BOD[,1]*seq(6), Y=BOD[,2]*seq(6))"
>>
>> If you want to trace it back further, it's not in transform but in 
>> data.frame. Column-names are prepended with a higher-level name if the 
>> object has more than one column.
>> And it uses the tag-name if simply supplied with a vector:
>> data.frame(BOD[1:2], X=BOD[1]*seq(6)) takes the name of the only column of 
>> BOD[1], Time. Only because that column name is already present, it's changed 
>> to Time.1
>> data.frame(BOD[1:2], X=BOD[,1]*seq(6)) gives third column-name X (as X is 
>> now a vector)
>> data.frame(BOD[1:2], X=BOD[1:2]*seq(6)) or with BOD[,1:2] gives columns 
>> names X.Time and X.demand, to show these (multiple) columns are coming from X
>>
>> So I don't think there's much to fix here. I this case having X.Time in all 
>> cases would have been better, but in general the column-naming of data.frame 
>> works, changing it would likely cause a lot of problems.
>> You can always change the column-names later.
>>
>> Best regards,
>> Emil Bode
>>
>> Data-analyst
>>
>> +31 6 43 83 89 33
>> emil.b...@dans.knaw.nl
>>
>> DANS: Netherlands Institute for Permanent Access to Digital Research 
>> Resources
>> Anna van Saksenlaan 51 | 2593 HW Den Haag | +31 70 349 44 50 | 
>> i...@dans.knaw.nl  | dans.knaw.nl 
>> 
>> DANS is an institute of the Dutch Academy KNAW  and 
>> funding organisation NWO .
>>
>> On 23/07/2018, 16:52, "R-devel on behalf of Gabor Grothendieck" 
>>  wrote:
>>
>> Note the inconsistency in the names in these two examples.  X.Time in
>> the first case and Time.1 in the second case.
>>
>>   > transform(BOD, X = BOD[1:2] * seq(6))
>> Time demand X.Time X.demand
>>   118.3  1  8.3
>>   22   10.3  4 20.6
>>   33   19.0  9 57.0
>>   44   16.0 16 64.0
>>   55   15.6 25 78.0
>>   67   19.8 42118.8
>>
>>   > transform(BOD, X = BOD[1] * seq(6))
>> Time demand Time.1
>>   118.3  1
>>   22   10.3  4
>>   33   19.0  9
>>   44   16.0 16
>>   55   15.6 25
>>   67   19.8 42
>>
>> --
>> Statistics & Software Consulting
>> GKX Group, GKX Associates Inc.
>> tel: 1-877-GKX-GROUP
>> email: ggrothendieck at gmail.com
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>
>
>
> --
> Statistics & Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] base::mean not consistent about NA/NaN

2018-07-02 Thread Ista Zahn

The current behavior is as documented. See ?NA, which says

"Numerical computations using ‘NA’ will normally result in ‘NA’: a
 possible exception is where ‘NaN’ is also involved, in which case
 either might result"

--Ista

On Mon, Jul 2, 2018 at 11:25 AM, Jan Gorecki  wrote:
> Hi,
> base::mean is not consistent in terms of handling NA/NaN.
> Mean should not depend on order of its arguments while currently it is.
>
> mean(c(NA, NaN))
> #[1] NA
> mean(c(NaN, NA))
> #[1] NaN
>
> I created issue so in case of no replies here status of it can be looked up
> at:
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17441
>
> Best,
> Jan
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] source(echo = TRUE) with a iso-8859-1 encoded file gives an error

2018-05-01 Thread Ista Zahn

Hi Scott,

This question is appropriate for the r-help mailing list, but probably
off-topic here on r-devel.

Best,
Ista

On Tue, May 1, 2018 at 2:57 PM, Scott Kostyshak  wrote:
> I have very little knowledge about file encodings and would like to
> learn more.
>
> I've read the following pages to learn more:
>
>   
> https://urldefense.proofpoint.com/v2/url?u=http-3A__stat.ethz.ch_R-2Dmanual_R-2Ddevel_library_base_html_Encoding.html=DwIDAw=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw=HegPJMcZ_5R6vYtdQLgIsh-M6ElOlewHPBZxe8IPSlI=
>   
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stackoverflow.com_questions_4806823_how-2Dto-2Ddetect-2Dthe-2Dright-2Dencoding-2Dfor-2Dread-2Dcsv=DwIDAw=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw=KGDvHJrfkvqbwyKnIiY0V45HtN-W4Rpq4ZBXfIFaFMk=
>   
> https://urldefense.proofpoint.com/v2/url?u=https-3A__developer.r-2Dproject.org_Encodings-5Fand-5FR.html=DwIDAw=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw=Ka1kGiCw3w22tOLfA50AyrKsMT-La14TQdutJJkdE04=
>
> The last one, in particular, has been very helpful. I would be
> interested in any further references that you suggest.
>
> I attach a file that reproduces the issue I would like to learn more
> about. I do not know if the file encoding will be correctly preserved
> through email, so I also provide the file (temporarily) on Dropbox here:
>
>   
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.dropbox.com_s_3lbgebk7b5uaia7_encoding-5Fexport-5Fissue.R-3Fdl-3D0=DwIDAw=pZJPUDQ3SB9JplYbifm4nt2lEVG5pWx2KikqINpWlZM=1fpq0SJ48L-zRWX2t0llEVIDZAHfU8S-4oINHlOA0rk=Hx2R8haOcpOy7nHCyZ63_tEVrmVn5txQk-yjGkgjKjw=58a7qB9IHt3s2ZLDglGEHwWARuo8xvSlH_z8G5jDaUY=
>
> The file gives an error when using "source()" with the
> argument echo = TRUE:
>
>   > source("encoding_export_issue.R", echo = TRUE)
>   Error in nchar(dep, "c") : invalid multibyte string, element 1
>   In addition: Warning message:
>   In grepl("^[[:blank:]]*$", dep[1L]) :
> input string 1 is invalid in this locale
>
> The problem comes from the "á" character in the .R file. The file
> appears to be encoded as "iso-8859-1":
>
>   $ file --mime-encoding encoding_export_issue.R
>   encoding_export_issue.R: iso-8859-1
>
> Note that for me:
>
>   > getOption("encoding")
>   [1] "native.enc"
>
> so "native.enc" is used for the "encoding" argument of source().
>
> The following two calls succeed:
>
>   > source("encoding_export_issue.R", echo = TRUE, encoding = "unknown")
>   > source("encoding_export_issue.R", echo = TRUE, encoding = "iso-8859-1")
>
> Is this file a valid "iso-8859-1" encoded file?  Why does source() fail
> in the case of encoding set to "native.enc"? Is it because of the
> settings to UTF-8 in my locale (see info on my system at the bottom of
> this email).
>
> I'm guessing it would be a bad idea to put
>
>   options(encoding = "unknown")
>
> in my .Rprofile, because it is difficult to always correctly guess the
> encoding of files? Is there a reason why setting it to "unknown" would
> lead to more problems than leaving it set to "native.enc"?
>
> I've reproduced the above behavior on R-devel (r74677) and 3.4.3. Below
> is my session info and locale info for my system with the 3.4.3 version:
>
>> sessionInfo()
> R version 3.4.3 (2017-11-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 16.04.3 LTS
>
> Matrix products: default
> BLAS: /usr/lib/libblas/libblas.so.3.6.0
> LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.3
>
>> Sys.getlocale()
> [1] 
> "LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C"
>
> Thanks for your time,
>
> Scott
>
>
> --
> Scott Kostyshak
> Assistant Professor of Economics
> University of Florida
> https://people.clas.ufl.edu/skostyshak/
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] In base R: argument `list` does not accept lists

2018-04-13 Thread Ista Zahn

On Fri, Apr 13, 2018 at 3:51 PM, Duncan Murdoch
 wrote:
> On 13/04/2018 7:21 AM, Johannes Rauh wrote:
>>
>> The function `base::rm` has an argument that is named `list`.  However, if
>> a list is passed as `list` to `rm` (e.g.: `rm(list = list("x", "y"))`), an
>> error is raised: "invalid first argument".
>>
>> Agreed, the documentation says that `list` should be "a character vector
>> naming objects to be removed."  Still, wouldn't it make sense to allow a
>> list of characters as an argument?
>>
>> The other alternative to make things consistent would be to rename the
>> argument, but that would break compatibility, of course.
>
>
> There are other functions (data(), save(), remove(), package.skeleton(),
> etc.) that use the convention that "list" names a character vector full of
> names, others where variations on that name ("affinity.list", "pkglist")
> have the same purpose, and still others where "list" takes a different kind
> of object entirely (untar(), unzip()).  I couldn't find any examples where
> an argument named "list" takes a list as a value.
>
> There really isn't any substitute for reading the documentation for any
> function you choose to use.

Maybe, though if so it's at least a little ironic that you make an
appeal to consistency in support of the status quo. "Read the docs"
you say, because if you do you'll see that "list" never means list,
and then you won't have to read the docs because you have learned the
convention. Maybe.

On the other hand, the OP wasn't so much reporting a bug as making a
feature request. Really, why shouldn't the "list" argument of rm,
data, save, remove, etc. accept either a list of a vector? I can't
think of anything it would hurt, and it would help people who assume
(reasonably enough IMO) that an argument named "list" will accept a
list as as a valid value?

Best,
Ista

>
> Duncan Murdoch
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] aggregate() naming -- bug or feature

2018-03-23 Thread Ista Zahn

On Fri, Mar 23, 2018 at 6:43 PM, Rui Barradas  wrote:
> Hello,
>
> Not exactly an answer but here it goes.
> If you use the formula interface the names will be retained.

Also if you pass named arguments:

aggregate(iris["Sepal.Length"], by = iris["Species"], FUN = foo)
#  Species Sepal.Length
# 1 setosa5.006
# 2 versicolor5.936
# 3  virginica6.588

If fact, this
> is even better than those names assigned by bar.
>
>
> aggregate(Sepal.Length ~ Species, data = iris, FUN = foo)
> # Species Sepal.Length
> #1 setosa5.006
> #2 versicolor5.936
> #3  virginica6.588
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 3/23/2018 1:29 PM, Randall Pruim wrote:
>>
>> In the examples below, the first loses the name attached by foo(), the
>> second retains names attached by bar().  Is this an intentional difference?
>> I’d prefer that the names be retained in both cases.
>>
>> foo <- function(x) { c(mean = base::mean(x)) }
>> bar <- function(x) { c(mean = base::mean(x), sd = stats::sd(x))}
>> aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo)
>> #>  Group.1 x
>> #> 1 setosa 5.006
>> #> 2 versicolor 5.936
>> #> 3  virginica 6.588
>> aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = bar)
>> #>  Group.1x.mean  x.sd
>> #> 1 setosa 5.006 0.3524897
>> #> 2 versicolor 5.936 0.5161711
>> #> 3  virginica 6.588 0.6358796
>>
>> —rjp
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] truncation/rounding bug with write.csv

2018-03-14 Thread Ista Zahn

I don't see the issue here. It would be helpful if people would report
their sessionInfo() when reporting whether or not they see this issue.
Mine is

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 rmsfact_0.0.3  cowsay_0.5.0   fortunes_1.5-4

On Wed, Mar 14, 2018 at 12:02 PM, Gregory Michaelson  wrote:
> I ran this code in RStudio Server on a linux machine, but I don’t know the 
> version offhand.  I will try to get it tomorrow.  Thanks.
>
> Thanks,
> Greg Michaelson
> www.datarobot.com
> 704-981-1118
>
>
>
>
>> On Mar 14, 2018, at 4:47 PM, Joris Meys  wrote:
>>
>> To my surprise, I can confirm on Windows 10 using R 3.4.3 . As tail is not 
>> recognized by Windows cmd, I replaced with:
>>
>> system('powershell -nologo "& "Get-Content -Path temp.csv -Tail 1')
>>
>> The last line shows only 7 digits after the decimal, whereas the first have 
>> 15 digits after the decimal. I agree with Dirk though, 1.6Gb csv files are 
>> not the best way to work with datasets.
>>
>> Cheers
>> Joris
>>
>>
>>
>> On Wed, Mar 14, 2018 at 1:53 PM, Dirk Eddelbuettel > > wrote:
>>
>> What OS are you on?  On Ubuntu 17.10 with R 3.4.3 all seems well (see
>> below for your example, I just added a setwd()).
>>
>> [ That said, I long held a (apparently minority) view that csv is for all
>> intends and purposes a less-than-ideal format.  If you have that much data,
>> you do generally not want to serialize it back and forth as that is slow, and
>> may drop precision.  The rds format is great for R alone; we now have C code
>> to read it from other apps (in the librdata repo by Evan Miller).  Different
>> portable serializations work too (protocol buffer, msgpack, ...), there are
>> databases and on and on... ]
>>
>> Dirk
>>
>>
>> R> df <- data.frame(replicate(100, runif(100, 0,1)))
>> R> setwd("/tmp")
>> R> write.csv(df, "temp.csv")
>> R> system('tail -n1 temp.csv')
>> "100",0.11496100993827,0.740764639340341,0.519190795486793,0.736045523779467,0.537115448853001,0.769496953347698,0.102257401449606,0.437617724528536,0.173321532085538,0.351960731903091,0.397348914295435,0.496789071243256,0.463006566744298,0.573105450021103,0.575196429155767,0.821617329493165,0.112913676071912,0.187580146361142,0.121353451395407,0.576333721866831,0.00763232703320682,0.468676633667201,0.451408475637436,0.0172415724955499,0.946199159137905,0.439950440311804,0.109224532730877,0.657066411571577,0.0524766123853624,0.54859598656185,0.94473168021068,0.500153199071065,0.636756601976231,0.221365773351863,0.620196332456544,0.559639401268214,0.198483835440129,0.397874651942402,0.710652963491157,0.317212327616289,0.239299293374643,0.0606942125596106,0.165786643279716,0.667431530542672,0.436631754040718,0.812185280025005,0.374252707697451,0.421187321422622,0.730321826180443,0.904493971262127,0.399387824581936,0.650714065413922,0.594219180056825,0.147960299625993,0.941945064114407,0.357223904458806,0.275038427906111,0.191008436959237,0.957893384154886,0.211530723143369,0.680650093592703,0.503884038887918,0.754094189498574,0.74776051659137,0.673691919771954,0.236221367260441,0.825558929471299,0.21071959589608,0.246618688805029,0.686810691142455,0.0247942050918937,0.572868114337325,0.494058627169579,0.684360746992752,0.0139967589639127,0.626861660508439,0.417218193877488,0.410173830809072,0.390906651504338,0.477168896235526,0.382211019750684,0.597674581920728,0.198329919017851,0.0684413285925984,0.450342149706557,0.133007253985852,0.755873151356354,0.372862737858668,0.762442974606529,0.582133987685665,0.692048883531243,0.259269661735743,0.147847984684631,0.635266482364386,0.320955650880933,0.00151186063885689,0.446474697208032,0.0673662247136235,0.791947861900553,0.0973296447191387
>> R> system('head -n2 temp.csv')
>>

Re: [Rd] writeLines argument useBytes = TRUE still making conversions

2018-02-15 Thread Ista Zahn

On Thu, Feb 15, 2018 at 11:19 AM, Kevin Ushey  wrote:
> I suspect your UTF-8 string is being stripped of its encoding before
> write, and so assumed to be in the system native encoding, and then
> re-encoded as UTF-8 when written to the file. You can see something
> similar with:
>
> > tmp <- 'é'
> > tmp <- iconv(tmp, to = 'UTF-8')
> > Encoding(tmp) <- "unknown"
> > charToRaw(iconv(tmp, to = "UTF-8"))
> [1] c3 83 c2 a9
>
> It's worth saying that:
>
> file(..., encoding = "UTF-8")
>
> means "attempt to re-encode strings as UTF-8 when writing to this
> file". However, if you already know your text is UTF-8, then you
> likely want to avoid opening a connection that might attempt to
> re-encode the input. Conversely (assuming I'm understanding the
> documentation correctly)
>
> file(..., encoding = "native.enc")
>
> means "assume that strings are in the native encoding, and hence
> translation is unnecessary". Note that it does not mean "attempt to
> translate strings to the native encoding".

If all that is true I think ?file needs some attention. I've read it
several times now and I just don't see how it can be interpreted as
you've described it.

Best,
Ista

>
> Also note that writeLines(..., useBytes = FALSE) will explicitly
> translate to the current encoding before sending bytes to the
> requested connection. In other words, there are two locations where
> translation might occur in your example:
>
>1) In the call to writeLines(),
>2) When characters are passed to the connection.
>
> In your case, it sounds like translation should be suppressed at both steps.
>
> I think this is documented correctly in ?writeLines (and also the
> Encoding section of ?file), but the behavior may feel unfamiliar at
> first glance.
>
> Kevin
>
> On Wed, Feb 14, 2018 at 11:36 PM, Davor Josipovic  wrote:
>>
>> I think this behavior is inconsistent with the documentation:
>>
>>   tmp <- 'é'
>>   tmp <- iconv(tmp, to = 'UTF-8')
>>   print(Encoding(tmp))
>>   print(charToRaw(tmp))
>>   tmpfilepath <- tempfile()
>>   writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = 
>> TRUE)
>>
>> [1] "UTF-8"
>> [1] c3 a9
>>
>> Raw text as hex: c3 83 c2 a9
>>
>> If I switch to useBytes = FALSE, then the variable is written correctly as  
>> c3 a9.
>>
>> Any thoughts? This behavior is related to this issue: 
>> https://github.com/yihui/knitr/issues/1509
>>
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] release build of ChemmineR failing

2018-02-07 Thread Ista Zahn

This is not the right place to report Bioconductor issues. Even if it
were you have not provided adequate reproduction steps.

source("https://bioconductor.org/biocLite.R;)
biocLite("ChemmineR")
library(ChemmineR)

works fine for me. When you find the correct venue for reporting this
issue I hope you'll more specific about what the problem is.

Best,
Ista

On Wed, Feb 7, 2018 at 11:07 AM, Kevin Horan  wrote:
>
> The release version of ChemmineR is failing on windows. It seems to be a
> build script issue though, possibly something on your side. The package
> was building fine a few weeks ago and I have not modified it. Can you
> please have a look? Thanks.
>
> "C:/Users/BIOCBU˜1/BBS-3˜1.6-B/R/bin/Rscript" -e "library(rmarkdown); 
> library(BiocStyle); rmarkdown::render('ChemmineR.Rmd')"
> 'C:\Users\BIOCBU˜1\BBS-3˜1.6-B\R\bin\x64\Rscript.exe" -e "library' is not 
> recognized as an internal or external command,operable program or batch file.
>
> http://bioconductor.org/checkResults/release/bioc-LATEST/ChemmineR/tokay1-buildsrc.html
>
>
> Kevin
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] rsvg on mac

2018-02-07 Thread Ista Zahn

Hi Kevin,

I can't imagine what gave you the idea that r-devel is an appropriate
place to make requests regarding bioconductor build infrastructure. It
is not.

Best,
Ista

On Wed, Feb 7, 2018 at 10:59 AM, Kevin Horan  wrote:
>
> The ChemmineR build is failing on the mac due to a new dependency not being
> available, the package "rsvg". Would it be possible to install that on the
> mac build machine? Thanks.
>
> http://bioconductor.org/checkResults/devel/bioc-LATEST/ChemmineR/merida2-install.html
>
>
> Kevin
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible bug in package installation when R_ICU_LOCALE is set

2018-02-07 Thread Ista Zahn

I can reproduce this on Linux, so it is not Windows-specific.

> sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblas_haswellp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_3.4.3 rmsfact_0.0.3  tools_3.4.3cowsay_0.5.0   fortunes_1.5-4


On Wed, Feb 7, 2018 at 8:38 AM, Korpela Mikko (MML)
 wrote:
> On a Windows computer (other platforms not tested), installing a
> package from source may fail if the environment variable R_ICU_LOCALE
> is set, depending on the package and the locale.
>
> For example, after setting R_ICU_LOCALE to "fi_FI",
>
>   install.packages("seriation", type = "source")
>
> (package version 1.2-3) fails with the following error:
>
> ** preparing package for lazy loading
> Error in set_criterion_method("dist", "AR_events", criterion_ar_events,  :
>   could not find function "set_criterion_method"
> Error : unable to load R code in package 'seriation'
>
> Package "Epi" (version 2.24) fails similarly:
>
> ** preparing package for lazy loading
> Error in eval(exprs[i], envir) : object 'Relevel.default' not found
> Error : unable to load R code in package 'Epi'
>
> Whether R_ICU_LOCALE is set before R is launched or during the session
> doesn't matter: installation of these two example packages fails
> either way. If R_ICU_LOCALE is unset, calling
>
>   icuSetCollate(locale = "fi_FI")
>
> is harmless. Browsing through the R manuals, I did not find warnings
> against using R_ICU_LOCALE, or any indication why package installation
> should fail with the variable being set. About the collation order of R
> code files, "Writing R Extensions" says:
>
>> The default is to collate according to the 'C' locale.
>
> I interpret this (and the surrounding text) as a "promise" to package
> developers that no matter what the end user does, the developer should
> be able to rely on the collation order being 'C' unless the developer
> defines another order.
>
>> sessionInfo()
> R version 3.4.3 Patched (2018-02-03 r74231)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=Finnish_Finland.1252  LC_CTYPE=Finnish_Finland.1252
> [3] LC_MONETARY=Finnish_Finland.1252 LC_NUMERIC=C
> [5] LC_TIME=Finnish_Finland.1252
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.3 tools_3.4.3
>
> --
> Mikko Korpela
> Chief Expert, Valuations
> National Land Survey of Finland
> Opastinsilta 12 C, FI-00520 Helsinki, Finland
> +358 50 462 6082
> www.maanmittauslaitos.fi
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] OpenBLAS in everyday R?

2018-01-11 Thread Ista Zahn

On Jan 10, 2018 8:24 PM, "Benjamin Tyner"  wrote:

Thanks Keith. We checked, and indeed libopenblas is not linked against
libomp nor libgomp. We suspect this is because we used conda to install R
and OpenBLAS. So I guess we should be barking up the conda tree instead?


What are you barking about? I don't understand what you are trying to
accomplish.

By the way, I also noticed on my home machine (Ubuntu),
/usr/lib/libopenblas.so.0 is also not linked against those, for what that's
worth.

Regards,
Ben


On 01/10/2018 12:04 AM, Keith O'Hara wrote:

> Check if libopenblas is linked against libomp or libgomp.
>
> I’d be curious to see any errors that arise when an OpenMP version of
> OpenBLAS is linked with R.
>
> Keith
>
>
> On Jan 9, 2018, at 11:01 PM, Benjamin Tyner  wrote:
>>
>> I didn't do the compile; is there a way to check whether that was used?
>> If not, I'll inquire with our sysadmin and report back.
>>
>> In any case, my suggestion was motivated by the fact that some parts of R
>> use OpenMP while others do not, in the hope that the former could have
>> their OpenBLAS omelet without breaking the OpenMP eggs, so to speak.
>>
>>
>> On 01/09/2018 06:41 PM, Keith O'Hara wrote:
>>
>>> Do those issues still arise when OpenBLAS is compiled with USE_OPENMP=1 ?
>>>
>>> Keith
>>>
>>> On Jan 9, 2018, at 6:03 PM, Benjamin Tyner  wrote:

 Please pardon my ignorance, but doesn't OpenBLAS still not always play
 nicely with multi-threaded OpenMP? (for example, don't race conditions
 sometimes crop up)? If so, it might be nice to have the ability to
 temporarily disable multi-threaded OpenMP (effectively:
 omp_set_num_threads(1)) for the duration of operations using OpenBLAS.

 Regards
 Ben

 Julia using OpenBLAS is *very* reassuring.
>
> I agree that having it included as an options(...) feature should be
> OK.
>
> On Sun, Dec 17, 2017, 3:22 PM Juan Telleria  https://stat.ethz.ch/mailman/listinfo/r-devel>> wrote:
>
> /Julia Programming Language uses also OpenBlas, and it is actively
>> />/maintained with bugs being fixed as I have checked it out: />//>/
>> http://www.openblas.net/Changelog.txt />//>/So I still see it ok to
>> be included as an options(...) feature (by default />/off, just for
>> safety), over other Blas libraries. />//>/R could not use Intel MKL for
>> legal reasons (I think), because as long />/that R ships with GPL
>> libraries, shipping R by default with Non-GPL is />/illegal. 
>> />//>/Cheers,
>> />/Juan />//>/El 17/12/2017 2:50 a. m., "Avraham Adler" > gmail.com >
>> />/escribió: />//>>/On Sat, Dec 16, 2017 at 7:41 PM, Kenny Bell > at gmail.com > wrote:
>> />>/> It seems like many of the multi-threaded BLASes have some sort of
>> />>/> fundamental problem preventing use in the way Juan suggests: />>/>
>> />>/> - Dirk's vignette states that ATLAS "fixes the number of cores used
>> at />>/> compile-time and cannot vary this setting at run-time", so any
>> />>/> user-friendly implementation for R would have to compile ATLAS for
>> 1-16 />>/> threads to allow the user to switch at run-time. This might
>> dramatically />>/> affect install times. />>/> />>/> - MKL seems like 
>> it's
>> been outright rejected in the past based on not />>/> being 
>> "free-enough".
>> />>/> />>/> - OpenBLAS causes crashes. />>/> />>/> Has anyone tried 
>> ExBLAS
>> for use with R? />>/> />>/> On Sun, Dec 17, 2017 at 1:03 PM, Peter
>> Langfelder < />>/> peter.langfelder at gmail.com <
>> https://stat.ethz.ch/mailman/listinfo/r-devel>> wrote: />>/> />>/>>
>> I would be very cautious about OpenBLAS in particular... from time to
>> />>/>> time I get complains from users that compiled code calculations in
>> my />>/>> WGCNA package crash or produce wrong answers with large data, 
>> and
>> they />>/>> all come from OpenBLAS users. I am yet to reproduce any of
>> their />>/>> crashes when using MKL and ATLAS BLAS implementations. 
>> />>/>>
>> />>/>> Just my 2 cents... />>/>> />>/>> Peter />>//>>/I've been building 
>> R
>> on Windows 64 bit with OpenBLAS for years and my />>/builds pass
>> check-devel. For a while in the past it failed one check />>/as the
>> tolerance was 5e-5 and with my build of OpenBLAS the error was />>/5.4e-5
>> or 5.7e-5, but that was changed around R 3.3, if I recall />>/correctly. 
>> I
>> provide descriptions here [1], but I haven't gone so far />>/as to post
>> compiled Rblas.dlls just yet. My personal build sets 4 />>/threads when
>> compiling OpenBLAS itself as I'm currently on a quad-core 
>> />>/SandyBridge.
>> In tests I ran a few years ago, both single and multi />>/threaded BLAS
>>

Re: [Rd] Debate: Shall some of Microsoft R Open Code be ported to mainstream R?

2017-10-30 Thread Ista Zahn

On Mon, Oct 30, 2017 at 12:45 PM, Cohn, Robert S
 wrote:
> I think the thing that is missing is a simple way for end users on windows to 
> replace blas/lapack libraries with MKL-a package that you install that puts 
> the libraries in the right place.
>
> Microsoft provides something for their distro, but we don't have the 
> equivalent if you get R from cran.

I don't have any interest in using a proprietary BLAS when a perfectly
good free alternative exists. If we just care about performance,
swapping out the reference BLAS for openBLAS is pretty simple
apparently:

- Download pre-compiled openblas from
https://sourceforge.net/projects/openblas/files/ (windows binaries
currently available for 2.19 but not 2.20)
- Unzip and copy libopenblas.dll to C:\Program Files\R\R-3.4.x\bin\x64
and rename it "Rblas.dll"
- Download the mingw dll's (e.g., from
https://sourceforge.net/projects/openblas/files/v0.2.15/mingw64_dll.zip/download),
unzip and copy all the dll's to C:\Program Files\R\R-3.4.x\bin\x64

Best,
Ista

>
>
> On 29 October 2017 at 22:01, Kenny Bell wrote:
>
> | User here: incorporating Intel's MKL, as MRO does, would be a very welcome
>
> | addition.
>
> |
>
> | I was an MRO user before and it improved my experience with medium data
>
> | immensely.
>
> |
>
> | They did, however, leave behind bugs here and there, especially related to
>
> | development with Rcpp, so I switched back to vanilla R.
>
>
>
> With all due respect: You may miss something. MKL has always worked with 
> 'Base R'.
>
>
>
> As a point of reference and comparison, I set up a benchmarking and
>
> comparison package _well over half a decade ago_ and while it never get fully
>
> finished to the point of a submitted paper the vignette still stands---and
>
> demonstrates that _dropping in MKL is a one-line operation_.
>
>
>
> And always has been.  There may have been some license arbitrage: Intel was
>
> an early investor in Revo, so MKL was pushed hard.  With GotoBLAS and later
>
> OpenBLAS I cared less, but IIRC the license of MKL is a little simpler for
>
> "mere use" now.
>
>
>
> See  https://cloud.r-project.org/web/packages/gcbd/vignettes/gcbd.pdf  for 
> more.
>
>
>
> Hth,  Dirk
>
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Debate: Shall some of Microsoft R Open Code be ported to mainstream R?

2017-10-30 Thread Ista Zahn

On Sun, Oct 29, 2017 at 11:09 PM, Dirk Eddelbuettel  wrote:
>
> On 29 October 2017 at 22:01, Kenny Bell wrote:
> | User here: incorporating Intel's MKL, as MRO does, would be a very welcome
> | addition.
> |
> | I was an MRO user before and it improved my experience with medium data
> | immensely.
> |
> | They did, however, leave behind bugs here and there, especially related to
> | development with Rcpp, so I switched back to vanilla R.
>
> With all due respect: You may miss something. MKL has always worked with 
> 'Base R'.
>
> As a point of reference and comparison, I set up a benchmarking and
> comparison package _well over half a decade ago_ and while it never get fully
> finished to the point of a submitted paper the vignette still stands---and
> demonstrates that _dropping in MKL is a one-line operation_.
>
> And always has been.  There may have been some license arbitrage: Intel was
> an early investor in Revo, so MKL was pushed hard.  With GotoBLAS and later
> OpenBLAS I cared less, but IIRC the license of MKL is a little simpler for
> "mere use" now.
>
> See  https://cloud.r-project.org/web/packages/gcbd/vignettes/gcbd.pdf  for 
> more.

FWIW, I've been using openBlas for years now, based on this and other
benchmarks. It provides performance comparable to MKL while being
really free.

Best,
Ista

>
> Hth,  Dirk
>
> |
> | On Mon, Oct 30, 2017, 9:42 AM Juan Telleria  wrote:
> |
> | > Dear R Developers,
> | >
> | > First of all, I would like to thank you Jeroen Ooms for taking the binary
> | > Window Builds from Duncan. I firmly believe that the R Community will
> | > benefit a lot from his work.
> | >
> | > However, the debate I would like to open is about if some of Microsoft R
> | > Open Code shall be ported from R Open to Mainstream R.
> | >
> | > There are some beneficts in R Open such as multithreaded performance:
> | > https://mran.microsoft.com/documents/rro/multithread/
> | >
> | > Maybe, the R Consortium, and in particular, Microsoft R Team, could
> | > collaborate, if appropriate, in such duty.
> | >
> | > Thank you,
> | > Juan Telleria
> | >
> | > [[alternative HTML version deleted]]
> | >
> | > __
> | > R-devel@r-project.org mailing list
> | > https://stat.ethz.ch/mailman/listinfo/r-devel
> | >
> |
> |   [[alternative HTML version deleted]]
> |
> | __
> | R-devel@r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] what do you think about write.table(... qmethod = "excel")?

2017-09-19 Thread Ista Zahn

On Tue, Sep 19, 2017 at 1:04 PM, Paul Johnson  wrote:
> Last week one of our clients reported trouble with a csv file I
> generated with write.table.  He said that columns with quotes for
> character variables were rejected by their data importer, which was
> revised to match the way Microsoft Excel uses quotation marks in
> character variables.  I explained to them that quoted character
> variables are virtuous and wise, of course, but they say Microsoft
> Excel CSV export no longer quotes characters unless they include
> commas in the values.
>
> They showed me a CSV file from Excel that looked like this
>
> x1,x2,x3,x4 5 6
> fred,barney,betty,x
> bambam,"fred,wilma",pebbles,y
>
> Note how the quotes only happen on row 2 column 2. I was surprised it
> did that, but now I have some pressure to write a csv maker that has
> that structure.

I think you should resist that pressure. It really makes no sense to
write a .csv parser that _only_ supports .csv files created by Excel.
If you're going to use Excel as a model, a more sensible approach
would be to write a csv parser that supports all the formats that
Excel itself supports; Excel of course has no problem importing

"x1","x2","x3","x4"
"fred","barney","betty","x"
"bambam","fred,wilma","pebbles","y"

So, seriously, tell them to just fix their csv parser. Since they seem
hung up on Excel, it may help to point out that it does in fact import
csv produced by write.csv without complaint.

Best,
Ista

 Its weird, even when there are spaces in values there
> are no quotation marks.
>
> Has anybody done this and verified that it matches CSV from MS Excel?
> If I succeed will you consider a patch?
>
> pj
> --
> Paul E. Johnson   http://pj.freefaculty.org
> Director, Center for Research Methods and Data Analysis http://crmda.ku.edu
>
> To write to me directly, please address me at pauljohn at ku.edu.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] readLines() segfaults on large file & question on how to work around

2017-09-02 Thread Ista Zahn

As s work-around I  suggest readr::read_file.

--Ista


On Sep 2, 2017 2:58 PM, "Jennifer Lyon"  wrote:

> Hi:
>
> I have a 2.1GB JSON file. Typically I use readLines() and
> jsonlite:fromJSON() to extract data from a JSON file.
>
> When I try and read in this file using readLines() R segfaults.
>
> I believe the two salient issues with this file are
> 1). Its size
> 2). It is a single line (no line breaks)
>
> I can reproduce this issue as follows
> #Generate a big file with no line breaks
> # In R
> > writeLines(paste0(c(letters, 0:9), collapse=""), "alpha.txt", sep="")
>
> # in unix shell
> cp alpha.txt file.txt
> for i in {1..26}; do cat file.txt file.txt > file2.txt && mv -f file2.txt
> file.txt; done
>
> This generates a 2.3GB file with no line breaks
>
> in R:
> > moo <- readLines("file.txt")
>
>  *** caught segfault ***
> address 0x7cff, cause 'memory not mapped'
>
> Traceback:
>  1: readLines("file.txt")
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
> Selection: 3
>
> I conclude:
>  I am potentially running up against a limit in R, which should give a
> reasonable error, but currently just segfaults.
>
> My question:
> Most of the content of the JSON is an approximately 100K x 6K JSON
> equivalent of a dataframe, and I know R can handle much bigger than this
> size. I am expecting these JSON files to get even larger. My R code lives
> in a bigger system, and the JSON comes in via stdin, so I have absolutely
> no control over the data format. I can imagine trying to incrementally
> parse the JSON so I don't bump up against the limit, but I am eager for
> suggestions of simpler solutions.
>
> Also, I apologize for the timing of this bug report, as I know folks are
> working to get out the next release of R, but like so many things I have no
> control over when bugs leap up.
>
> Thanks.
>
> Jen
>
> > sessionInfo()
> R version 3.4.1 (2017-06-30)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.5 LTS
>
> Matrix products: default
> BLAS: R-3.4.1/lib/libRblas.so
> LAPACK:R-3.4.1/lib/libRlapack.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8   LC_NAME=C
>  [9] LC_ADDRESS=C   LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics  grDevices utils datasets  methods   base
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.1
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Control multi-threading in standard matrix product

2017-08-21 Thread Ista Zahn

Hi Ghislain,

The documentation at
https://cran.r-project.org/doc/manuals/r-release/R-admin.html#BLAS
provides a fair bit of information. What specifically would you like
to see added?

Best,
Ista

On Mon, Aug 21, 2017 at 10:13 AM, Ghislain Durif
 wrote:
> Hi Tomas,
>
> Thanks for your answer.
>
> Indeed, I checked and my R-3.4.1 installed from the ubuntu repository use
> 'libopenblasp-r0.2.18.so' while my R-3.3.2 that I did compiled on my machine
> use 'libRblas.so' which explain the difference of behavior.
>
> I will use RhpcBLASctl to avoid issue when combining matrix product and
> other multi-threading package.
>
> Maybe this point regarding multi-threading with BLAS could be added in the R
> doc.
>
> Thanks again,
> Best,
>
> Ghislain
>
> Ghislain Durif
> --
> Research engineer THOTH TEAM
> INRIA Grenoble Alpes (France)
>
> Le 21/08/2017 à 15:53, Tomas Kalibera a écrit :
>>
>> Hi Ghislain,
>>
>> I think you might be comparing two versions of R with different BLAS
>> implementations, one that is single threaded (is your 3.3.2 used with
>> reference blas?) and one that is multi threaded (3.4.1 with openblas). Could
>> you check with "perf"? E.g. run your benchmark with "perf record" in both
>> cases and you should see the names of the hot BLAS functions and this should
>> reveal the BLAS implementation (look for dgemm).
>>
>> In Ubuntu, if you install R from the package system, whenever you run it
>> it will use the BLAS currently installed via the package system. However if
>> you build R from source on Ubuntu, by default, it will use the reference
>> BLAS which is distributed with R. Section "Linear algebra" of "R
>> Installation and Administration" has details on how to build R with
>> different BLAS/LAPACK implementations.
>>
>> Sadly there is no standard way to specify the number of BLAS worker
>> threads. RhpcBLASctl has specific code for several existing implementations,
>> but R itself does not attempt to control BLAS multi threading in any way. It
>> is expected the user/system administrator will configure their BLAS
>> implementation of choice to use the number of threads they need. A similar
>> problem exists in other internally multi-threaded third-party libraries,
>> used by packages - R cannot control how many threads they run.
>>
>> Best
>> Tomas
>>
>> On 08/21/2017 02:55 PM, Ghislain Durif wrote:
>>>
>>> Dear R Core Team,
>>>
>>> I wish to report what can be viewed as a bug or at least a strange
>>> behavior in R-3.4.1. I ask my question here (as recommended on
>>> https://www.r-project.org/bugs.html) since I am not member of the R's
>>> Bugzilla.
>>>
>>> When running 'R --vanilla' from the command line, the standard matrix
>>> product is by default based on BLAS and multi-threaded on all cores
>>> available on the machine, c.f. following examples:
>>>
>>> n=1
>>> p=1000
>>> q=5000
>>> A = matrix(runif(n*p),nrow=n, ncol=p)
>>> B = matrix(runif(p*q),nrow=p, ncol=q)
>>> C = A %*% B # multi-threaded matrix product
>>>
>>>
>>> However, the default behavior to use all available cores can be an
>>> issue, especially on shared computing resources or when the matrix
>>> product is used in parallelized section of codes (for instance with
>>> 'mclapply' from the 'parallel' package). For instance, the default
>>> matrix product is single-threaded in R-3.3.2 (I ran a test on my
>>> machine), this new features will deeply affect the behavior of existing
>>> R packages that use other multi-threading solutions.
>>>
>>> Thanks to this stackoverflow question
>>>
>>> (https://stackoverflow.com/questions/45794290/in-r-how-to-control-multi-threading-in-blas-parallel-matrix-product),
>>> I now know that it is possible to control the number of BLAS threads
>>> thanks to the package 'RhpcBLASctl'. However, being able to control the
>>> number of threads should maybe not require to use an additional package.
>>>
>>> In addition, the doc 'matmult' does not mention this point, it points to
>>> the 'options' doc page and especially the 'matprod' section, in which
>>> the multi-threading is not discussed.
>>>
>>>
>>> Here is the results of the 'sessionInfo()' function on my machine for
>>> R-3.4.1:
>>> R version 3.4.1 (2017-06-30)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 16.04.3 LTS
>>>
>>> Matrix products: default
>>> BLAS: /usr/lib/openblas-base/libblas.so.3
>>> LAPACK: /usr/lib/libopenblasp-r0.2.18.so
>>>
>>> locale:
>>>[1] LC_CTYPE=fr_FR.utf8   LC_NUMERIC=C
>>>[3] LC_TIME=fr_FR.utf8LC_COLLATE=fr_FR.utf8
>>>[5] LC_MONETARY=fr_FR.utf8LC_MESSAGES=fr_FR.utf8
>>>[7] LC_PAPER=fr_FR.utf8   LC_NAME=C
>>>[9] LC_ADDRESS=C  LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics  grDevices utils datasets  methods base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] compiler_3.4.1
>>>
>>>
>>>
>>> and for

Re: [Rd] v3.4.0-2 incompatible with gcc 7.1

2017-06-23 Thread Ista Zahn

Yes, regular install from the official repositories, all packages up
to date, works like a charm. I'm happy to follow up with you off-list
if you like, since this probably isn't interesting to a general R-user
audience.

Best,
Ista

On Fri, Jun 23, 2017 at 9:51 PM, Chris Cole <chris.c.1...@gmail.com> wrote:
> Thanks Ista, that's good to know. Did you install from pacman?
>
> Chris
>
> On Fri, 23 Jun 2017 at 20:35 Ista Zahn <istaz...@gmail.com> wrote:
>>
>> FWIW, I don't have any problems with R on Arch Linux.
>>
>> On Jun 23, 2017 1:32 PM, "Chris Cole" <chris.c.1...@gmail.com> wrote:
>>>
>>> Thank you for correcting my misunderstandings, Professor. Compiling from
>>> source did the trick, and I'll be following up with the arch maintainers
>>> about addressing the issue on their end.
>>>
>>> Best,
>>>
>>> Chris
>>>
>>> On Fri, 23 Jun 2017 at 11:02 Prof Brian Ripley <rip...@stats.ox.ac.uk>
>>> wrote:
>>>
>>> > R is compatible with GCC 7.1 !  New compiler versions are tested, as
>>> > well as those under development for the major compilers.  (A few
>>> > packages still fail with GCC 7.1, but that was reported to their
>>> > maintainers months ago.)
>>> >
>>> > Just follow the instructions in the R-admin manual to install from
>>> > sources.
>>> >
>>> > OTOH, ' v3.4.0-2 ' is not an R version number, so I think you are
>>> > referring to binary distributions on your Linux distro, which are not
>>> > the responsibility of 'Rcore or Rdevel' (whatever they are).
>>> >
>>> > On 23/06/2017 14:40, Chris Cole wrote:
>>> > > I'm on Arch Linux kernel version 4.11.6-1 using gcc version 7.1.1:
>>> > >
>>> > > gcc --version
>>> > > gcc (GCC) 7.1.1 20170516
>>> > >
>>> > > I have installed R through the arch package manager pacman and when I
>>> > > attempt to initiate it, R crashes stating a missing dependency:
>>> > >
>>> > > /usr/lib64/R/bin/exec/R: error while loading shared libraries:
>>> > > libgfortran.so.3: cannot open shared object file: No such file or
>>> > directory
>>> > >
>>> > > I thought that maybe a symlink was improperly placed in the package
>>> > > so I
>>> > > looked in /usr/lib to try to find the offending library.
>>> > >
>>> > > ls -halt /usr/lib/libgfortran.so.*
>>> > > lrwxrwxrwx 1 root root 20 May 16 03:01 /usr/lib/libgfortran.so.4 ->
>>> > > libgfortran.so.4.0.0
>>> > > -rwxr-xr-x 1 root root 7.1M May 16 03:01
>>> > > /usr/lib/libgfortran.so.4.0.0
>>> > >
>>> > > Simply symlinking libgfortran.so.4.0.0 to libgfortran.so.3 did not
>>> > > work,
>>> > > and after some questioning on SO (
>>> > >
>>> >
>>> > https://stackoverflow.com/questions/44658867/r-v3-4-0-2-unable-to-find-libgfortran-so-3-on-arch
>>> > )
>>> > > it seems that gfortran 7 has bumped the .so object to version 4.
>>> > >
>>> > > It seems that a relatively straightforward workaround for the present
>>> > would
>>> > > be to install a legacy version of gcc alongside the current version.
>>> > >
>>> > > I'm wondering if Rcore or Rdevel are moving towards being able to
>>> > > handle
>>> > > the new compiler version any time soon, and if there are any other
>>> > > workarounds than having two versions of the compiler.
>>> > >
>>> > > Thanks.
>>> > >
>>> > > Chris
>>> > >
>>> > >   [[alternative HTML version deleted]]
>>> > >
>>> > > __
>>> > > R-devel@r-project.org mailing list
>>> > > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> > >
>>> >
>>> >
>>> > --
>>> > Brian D. Ripley,  rip...@stats.ox.ac.uk
>>> > Emeritus Professor of Applied Statistics, University of Oxford
>>> >
>>> > __
>>> > R-devel@r-project.org mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] v3.4.0-2 incompatible with gcc 7.1

2017-06-23 Thread Ista Zahn

FWIW, I don't have any problems with R on Arch Linux.

On Jun 23, 2017 1:32 PM, "Chris Cole"  wrote:

> Thank you for correcting my misunderstandings, Professor. Compiling from
> source did the trick, and I'll be following up with the arch maintainers
> about addressing the issue on their end.
>
> Best,
>
> Chris
>
> On Fri, 23 Jun 2017 at 11:02 Prof Brian Ripley 
> wrote:
>
> > R is compatible with GCC 7.1 !  New compiler versions are tested, as
> > well as those under development for the major compilers.  (A few
> > packages still fail with GCC 7.1, but that was reported to their
> > maintainers months ago.)
> >
> > Just follow the instructions in the R-admin manual to install from
> sources.
> >
> > OTOH, ' v3.4.0-2 ' is not an R version number, so I think you are
> > referring to binary distributions on your Linux distro, which are not
> > the responsibility of 'Rcore or Rdevel' (whatever they are).
> >
> > On 23/06/2017 14:40, Chris Cole wrote:
> > > I'm on Arch Linux kernel version 4.11.6-1 using gcc version 7.1.1:
> > >
> > > gcc --version
> > > gcc (GCC) 7.1.1 20170516
> > >
> > > I have installed R through the arch package manager pacman and when I
> > > attempt to initiate it, R crashes stating a missing dependency:
> > >
> > > /usr/lib64/R/bin/exec/R: error while loading shared libraries:
> > > libgfortran.so.3: cannot open shared object file: No such file or
> > directory
> > >
> > > I thought that maybe a symlink was improperly placed in the package so
> I
> > > looked in /usr/lib to try to find the offending library.
> > >
> > > ls -halt /usr/lib/libgfortran.so.*
> > > lrwxrwxrwx 1 root root 20 May 16 03:01 /usr/lib/libgfortran.so.4 ->
> > > libgfortran.so.4.0.0
> > > -rwxr-xr-x 1 root root 7.1M May 16 03:01 /usr/lib/libgfortran.so.4.0.0
> > >
> > > Simply symlinking libgfortran.so.4.0.0 to libgfortran.so.3 did not
> work,
> > > and after some questioning on SO (
> > >
> > https://stackoverflow.com/questions/44658867/r-v3-4-0-2-
> unable-to-find-libgfortran-so-3-on-arch
> > )
> > > it seems that gfortran 7 has bumped the .so object to version 4.
> > >
> > > It seems that a relatively straightforward workaround for the present
> > would
> > > be to install a legacy version of gcc alongside the current version.
> > >
> > > I'm wondering if Rcore or Rdevel are moving towards being able to
> handle
> > > the new compiler version any time soon, and if there are any other
> > > workarounds than having two versions of the compiler.
> > >
> > > Thanks.
> > >
> > > Chris
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> >
> > --
> > Brian D. Ripley,  rip...@stats.ox.ac.uk
> > Emeritus Professor of Applied Statistics, University of Oxford
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-08 Thread Ista Zahn

On Mon, May 8, 2017 at 8:08 AM, Antonin Klima  wrote:
> Thanks for the answers,
>
> I’m aware of the ‘.’ option, just wanted to give a very simple example.
>
> But the lapply ‘…' parameter use has eluded me and thanks for enlightening me.
>
> What do you mean by messing up the call stack. As far as I understand it, 
> piping should translate into same code as deep nesting.

Perhaps, but then magrittr is not really a pipe. Here is a simple example

library(magrittr)
data.frame(x = 1) %>%
subset(y == 1)
traceback()

> Error in eval(e, x, parent.frame()) : object 'y' not found
> 12: eval(e, x, parent.frame())
11: eval(e, x, parent.frame())
10: subset.data.frame(., y == 1)
9: subset(., y == 1)
8: function_list[[k]](value)
7: withVisible(function_list[[k]](value))
6: freduce(value, `_function_list`)
5: `_fseq`(`_lhs`)
4: eval(quote(`_fseq`(`_lhs`)), env, env)
3: eval(quote(`_fseq`(`_lhs`)), env, env)
2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1: data.frame(x = 1) %>% subset(y == 1)
>

subset(data.frame(x = 1),
  y == 1)
traceback()

> Error in eval(e, x, parent.frame()) : object 'y' not found
> 4: eval(e, x, parent.frame())
3: eval(e, x, parent.frame())
2: subset.data.frame(data.frame(x = 1), y == 1)
1: subset(data.frame(x = 1), y == 1)
>

It does pollute the call stack, making debugging harder.

 So then I only see a tiny downside for debugging here. No loss of
time/space efficiency or anything. With a change of inadvertent error
in your example, coming from the fact that a variable is being reused
and noone now checks for me whether it is being passed between the
lines. And with having to specify the variable every single time. For
me, that solution is clearly inferior.

There are tradeoffs. As demonstrated above, the pipe is clearly
inferior in that it is doing a lot of complicated stuff under the
hood, and when you try to traceback() through the call stack you have
to sift through all that complicated stuff. That's a pretty big
drawback in my opinion.

>
> Too bad you didn’t find my other comments interesting though.

I did not say that.

>
>>Why do you think being implemented in a contributed package restricts
>>the usefulness of a feature?
>
> I guess it depends on your philosophy. It may not restrict it per say, 
> although it would make a lot of sense to me reusing the bash-style ‘|' and 
> have a shorter, more readable version. One has extra dependence on a package 
> for an item that fits the language so well that it should be its part.  It is 
> without doubt my most used operator at least. Going to some of my folders I 
> found 101 uses in 750 lines, and 132 uses in 3303 lines. I would compare it 
> to having a computer game being really good with a fan-created mod, but 
> lacking otherwise. :)

One of the key strengths of R is that packages are not akin to "fan
created mods". They are a central and necessary part of the R system.

>
> So to me, it makes sense that if there is no doubt that a feature improves 
> the language, and especially if people extensively use it through a package 
> already, it should be part of the “standard”. Question is whether it is 
> indeed very popular, and whether you share my view. But that’s now up to you, 
> I just wanted to point it out I guess.

>
> Best Regards,
> Antonin
>
>> On 05 May 2017, at 22:33, Gabor Grothendieck  wrote:
>>
>> Regarding the anonymous-function-in-a-pipeline point one can already
>> do this which does use brackets but even so it involves fewer
>> characters than the example shown.  Here { . * 2 } is basically a
>> lambda whose argument is dot. Would this be sufficient?
>>
>>  library(magrittr)
>>
>>  1.5 %>% { . * 2 }
>>  ## [1] 3
>>
>> Regarding currying note that with magrittr Ista's code could be written as:
>>
>>  1:5 %>% lapply(foo, y = 3)
>>
>> or at the expense of slightly more verbosity:
>>
>>  1:5 %>% Map(f = . %>% foo(y = 3))
>>
>>
>> On Fri, May 5, 2017 at 1:00 PM, Antonin Klima  wrote:
>>> Dear Sir or Madam,
>>>
>>> I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
>>> computer science, and have been using R heavily during my PhD. As such, I 
>>> have put together a list of certain features in R that, in my opinion, 
>>> would be beneficial to add, or could be improved. The first two are already 
>>> implemented in packages, but given that it is implemented as user-defined 
>>> operators, it greatly restricts its usefulness. I hope you will find my 
>>> suggestions interesting. If you find time, I will welcome any feedback as 
>>> to whether you find the suggestions useful, or why you do not think they 
>>> should be implemented. I will also welcome if you enlighten me with any 
>>> features I might be unaware of, that might solve the issues I have pointed 
>>> out below.
>>>
>>> 1) piping
>>> Currently available in package magrittr, piping makes the code better 
>>> readable by having the line start at its natural

Re: [Rd] A few suggestions and perspectives from a PhD student

2017-05-05 Thread Ista Zahn

On Fri, May 5, 2017 at 1:00 PM, Antonin Klima  wrote:
> Dear Sir or Madam,
>
> I am in 2nd year of my PhD in bioinformatics, after taking my Master’s in 
> computer science, and have been using R heavily during my PhD. As such, I 
> have put together a list of certain features in R that, in my opinion, would 
> be beneficial to add, or could be improved. The first two are already 
> implemented in packages, but given that it is implemented as user-defined 
> operators, it greatly restricts its usefulness.

Why do you think being implemented in a contributed package restricts
the usefulness of a feature?

I hope you will find my suggestions interesting. If you find time, I
will welcome any feedback as to whether you find the suggestions
useful, or why you do not think they should be implemented. I will
also welcome if you enlighten me with any features I might be unaware
of, that might solve the issues I have pointed out below.
>
> 1) piping
> Currently available in package magrittr, piping makes the code better 
> readable by having the line start at its natural starting point, and 
> following with functions that are applied - in order. The readability of 
> several nested calls with a number of parameters each is almost zero, it’s 
> almost as if one would need to come up with the solution himself. Pipeline in 
> comparison is very straightforward, especially together with the point (2).

You may be surprised to learn that not everyone thinks pipes are a
good idea. Personally I see some advantages, but there is also a big
downside with is that they mess up the call stack and make tracking
down errors via traceback() more difficult.

There is a simple alternative to pipes already built in to R that
gives you some of the advantages of %>% without messing up the call
stack.  Using Hadley's famous "little bunny foo foo" example:

foo_foo <- little_bunny()

## nesting (it is rough)
bop(
  scoop(
hop(foo_foo, through = forest),
up = field_mice
  ),
  on = head
)

## magrittr
foo_foo %>%
  hop(through = forest) %>%
  scoop(up = field_mouse) %>%
  bop(on = head)

## regular R assignment
foo_foo -> .
  hop(., through = forest) -> .
  scoop(., up = field_mouse) -> .
  bop(., on = head)

This is more limited that magrittr's %>%, but it gives you a lot of
the advantages without the disadvantages.

>
> The package here works rather good nevertheless, the shortcomings of piping 
> not being native are not quite as severe as in point (2). Nevertheless, an 
> intuitive symbol such as | would be helpful, and it sometimes bothers me that 
> I have to parenthesize anonymous function, which would probably not be 
> required in a native pipe-operator, much like it is not required in f.ex. 
> lapply. That is,
> 1:5 %>% function(x) x+2
> should be totally fine

That seems pretty small-potatoes to me.

>
> 2) currying
> Currently available in package Curry. The idea is that, having a function 
> such as foo = function(x, y) x+y, one would like to write for example 
> lapply(foo(3), 1:5), and have the interpreter figure out ok, foo(3) does not 
> make a value result, but it can still give a function result - a function of 
> y. This would be indeed most useful for various apply functions, rather than 
> writing function(x) foo(3,x).

You can already do

lapply(1:5, foo, y = 3)

(assuming that the first argument to foo is named "y")

I'm stopping here since I don't have anything useful to say about your
subsequent points.

Best,
Ista

>
> I suggest that currying would make the code easier to write, and more 
> readable, especially when using apply functions. One might imagine that there 
> could be some confusion with such a feature, especially from people 
> unfamiliar with functional programming, although R already does take function 
> as first-order arguments, so it could be just fine. But one could address it 
> with special syntax, such as $foo(3) [$foo(x=3)] for partial application.  
> The current currying package has very limited usefulness, as, being limited 
> by the user-defined operator framework, it only rarely can contribute to less 
> code/more readability. Compare yourself:
> $foo(x=3) vs foo %<% 3
> goo = function(a,b,c)
> $goo(b=3) vs goo %><% list(b=3)
>
> Moreover, one would often like currying to have highest priority. For 
> example, when piping:
> data %>% foo %>% foo1 %<% 3
> if one wants to do data %>% foo %>% $foo(x=3)
>
> 3) Code executable only when running the script itself
> Whereas the first two suggestions are somewhat stealing from Haskell and the 
> like, this suggestion would be stealing from Python. I’m building quite a 
> complicated pipeline, using S4 classes. After defining the class and its 
> methods, I also define how to build the class to my likings, based on my 
> input data, using various now-defined methods. So I end up having a list of 
> command line arguments to process, and the way to create the class instance 
> based on them. If I write it to the

Re: [Rd] R 3.4 has broken C++11 support

2017-04-19 Thread Ista Zahn

Hi Philipp,

Fellow Archlinux user here. I think the problem is with the r-devel
PKGBUILD file, rather than anything wrong in R itself. The PKGBUILD
file does this:

ln -s /etc/R/${i} ${i}

when it should do

ln -s /etc/R-devel/${i} ${i}

You can fix your installed version with

cd /opt/r-devel/lib/R/etc/
sudo rm ./*
sudo ln -s /etc/R-devel/javaconf
sudo ln -s /etc/R-devel/ldpaths
sudo ln -s /etc/R-devel/Makeconf
sudo ln -s /etc/R-devel/Renviron
sudo ln -s /etc/R-devel/repositories

Or (better) fix the PKGBUILD and makepkg/pacman -U

Best,
Ista



On Wed, Apr 19, 2017 at 10:32 AM, Angerer, Philipp via R-devel
 wrote:
> Hi Dirk and Martyn,
>
>> That looks fine. Can you please give a reproducible example of a package
>> that compiles correctly on R 3.3.3 but not with R 3.4.0 or R-devel.
>
> here you go, it’s pretty much the simplest package possible that needs C++11:
>
> https://github.com/flying-sheep/cxx11test
>
>> Maybe you can share with us how you configure the build of R-devel?
>
> Sure, in the mail you quoted, I already linked exactly that:
>
> https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=r-devel#n40
>
>> ./configure --prefix=/opt/r-devel \
>>   --libdir=/opt/r-devel/lib \
>>   --sysconfdir=/etc/R-devel \
>>   --datarootdir=/opt/r-devel/share \
>> rsharedir=/opt/r-devel/share/R/ \
>> rincludedir=/opt/r-devel/include/R/ \
>> rdocdir=/opt/r-devel/share/doc/R/ \
>>   --with-x \
>>   --enable-R-shlib \
>>   --with-lapack \
>>   --with-blas \
>>   F77=gfortran \
>>   LIBnn=lib
>
>
> Thanks and cheers,
> Philipp
>
>
> Helmholtz Zentrum Muenchen
> Deutsches Forschungszentrum fuer Gesundheit und Umwelt (GmbH)
> Ingolstaedter Landstr. 1
> 85764 Neuherberg
> www.helmholtz-muenchen.de
> Aufsichtsratsvorsitzende: MinDir'in Baerbel Brumme-Bothe
> Geschaeftsfuehrer: Prof. Dr. Guenther Wess, Heinrich Bassler, Dr. Alfons 
> Enhsen
> Registergericht: Amtsgericht Muenchen HRB 6466
> USt-IdNr: DE 129521671
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] What happened to Ross Ihaka's proposal for a Common Lisp based R successor?

2016-08-05 Thread Ista Zahn

But you can easily fall back to R from within Julia; see
http://juliastats.github.io/RCall.jl/latest/

On Aug 5, 2016 1:27 PM, "Hadley Wickham"  wrote:

> No.
>
> Hadley
>
> On Fri, Aug 5, 2016 at 11:12 AM, Kenny Bell  wrote:
> > Is it conceivable that Julia could be ported to use R syntax in a way
> that
> > would allow the vastly larger numbers of R programmers to seamlessly
> switch?
> > Or equivalently, could an iteration of R itself do this?
> >
> >
> > On Fri, Aug 5, 2016, 9:00 AM Hadley Wickham  wrote:
> >>
> >> When it was being actively worked on, it had the advantage of existing.
> >>
> >> Hadley
> >>
> >> On Fri, Aug 5, 2016 at 10:48 AM, Kenny Bell  wrote:
> >> > Why is the described system preferable to Julia?
> >> >
> >> > On Fri, Aug 5, 2016, 4:50 AM peter dalgaard  wrote:
> >> >
> >> >>
> >> >> On 05 Aug 2016, at 06:41 , Andrew Judson  wrote:
> >> >>
> >> >> > I read this paper
> >> >> >  2008.pdf>
> >> >> > and
> >> >> > haven't been able to find out what happened - I have seen some
> >> >> > sporadic
> >> >> > mention in message groups but nothing definitive. Does anyone know?
> >> >>
> >> >> Presumably Ross does...
> >> >>
> >> >> You get a hint if you go one level up and look for the newest file:
> >> >>
> >> >> https://www.stat.auckland.ac.nz/~ihaka/downloads/New-System.pdf
> >> >>
> >> >>
> >> >> --
> >> >> Peter Dalgaard, Professor,
> >> >> Center for Statistics, Copenhagen Business School
> >> >> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >> >> Phone: (+45)38153501
> >> >> Office: A 4.23
> >> >> Email: pd@cbs.dk  Priv: pda...@gmail.com
> >> >>
> >> >> __
> >> >> R-devel@r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >>
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > __
> >> > R-devel@r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >>
> >>
> >> --
> >> http://hadley.nz
>
>
>
> --
> http://hadley.nz
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unable to Install Packages from Binaries on Windows for R 3.2.3

2016-02-27 Thread Ista Zahn

Hi Steve,

CRAN only compiles packages for Windows and OS X, so this is a) completely
expected and b) completely unrelated to the issue being discussed in this
thread.

Best,
Ista
On Feb 27, 2016 12:32 PM, "Steve Bronder"  wrote:

> Removing 'type=binary' worked for me.
>
> install.packages(
>'httr',
> repos = "https://cran.rstudio.com/;
> )
>
> But I get an error when I select binary as type
> ---
>  install.packages(
>  'httr',
>  type = 'binary',
>  repos = "https://cran.rstudio.com/;
>  )
> Error in install.packages : type 'binary' is not supported on this
> platform.
> ---
>
> Same error for another package
> ---
> install.packages(
> 'lme4',
> type = 'binary',
> repos = "https://cran.rstudio.com/;)
>
> Error in install.packages : type 'binary' is not supported on this platform
> ---
>
>  Platform session info below:
>
> sessionInfo()
> R version 3.2.3 (2015-12-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 14.04.4 LTS
>
>
>
>
> Regards,
>
> Steve Bronder
> Website: stevebronder.com
> Phone: 412-719-1282
> Email: sbron...@stevebronder.com
>
>
> On Sat, Feb 27, 2016 at 11:33 AM, peter dalgaard  wrote:
>
> >
> > > On 27 Feb 2016, at 05:22 , Ramnath Vaidyanathan <
> > ramnath.vai...@gmail.com> wrote:
> > >
> > > Installing packages from binaries on Windows seems broken, when using
> > > mirrors that are up to date with CRAN
> > >
> > > install.packages(
> > >  'httr',
> > >  type = 'binary',
> > >  repos = "https://cran.rstudio.com/;
> > > )
> > >
> > > Changing repos to the Kansas CRAN mirror installs the package as
> > expected,
> > > but that could be because the KS mirror has not yet synced.
> > >
> > > Someone pointed out that the PACKAGES.gz file at
> > > https://cran.r-project.org/bin/windows/contrib/3.2/ seems to be
> > corrupted
> > > (0 KB), and this could be the issue.
> >
> >
> > It's at 202K now in both places. Perhaps just retry?
> >
> > -pd
> >
> > >
> > >   [[alternative HTML version deleted]]
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Office: A 4.23
> > Email: pd@cbs.dk  Priv: pda...@gmail.com
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Hidden files problem in R CMD check

2015-09-26 Thread Ista Zahn

Hi Christian,

This seems like a question about OSX rather than R. You will probably have
more luck asking on an apple forum. Or just google: http://bfy.tw/1zhP

Best,
Ista
On Sep 26, 2015 8:39 PM, "David Winsemius"  wrote:

>
> On Sep 26, 2015, at 2:06 PM, cstrato wrote:
>
> > Dear Dirk,
> >
> > Yes, I know, however forget for one moment R.
> >
> > If I use tar independent of R it still should not create these hidden
> files.
> >
> > BTW, do you know where these hidden files are stored on the Mac?
>
> Your first posting showed which of several different directories they were
> in. Do you understand that any file whose name starts with a  is
> called a "hidden file"? It is "hidden", i.e not displayed in a Finder
> window, from people who are using Finder.app unless you change the default
> settings. It's easy to look up the code that is needed to be pasted into a
> Terminal session. I never remember it. I just leave Finder set up to
> display these 'dotfiles' as they are also called.
>
> defaults write com.apple.finder AppleShowAllFiles YES
>
> killall Finder
>
> #The second command restarts Finder.app or you could try to restart the
> Finder by option (=alt) + rightclicking the Finder icon in the Dock and
> selecting Relaunch.
>
> --
> David.
>
> >
> > Best regards,
> > Christian
> >
> >
> > On 09/26/15 23:01, Dirk Eddelbuettel wrote:
> >>
> >> On 26 September 2015 at 22:41, cstrato wrote:
> >> | Dear Simon,
> >> |
> >> | Thank you very much for your help, it did solve my problems!! Great!
> >> |
> >> | I have googled COPYFILE_DISABLE and found the following site which
> does
> >> | explain the issue with tar on Mac OS X, see:
> >> |
> http://unix.stackexchange.com/questions/9665/create-tar-archive-of-a-directory-except-for-hidden-files
> >> |
> >> | Instead of doing:
> >> | $tar czf xps_1.29.2.tar.gz xps
> >> |
> >> | I did now:
> >> | $COPYFILE_DISABLE=1 tar czf xps_1.29.2.tar.gz xps
> >> |
> >> | Running:
> >> | $R CMD check xps_1.29.2.tar.gz
> >> | now leaves only '.BBSoptions' as hidden file.
> >>
> >> No, still wrong. As Simon said, we all are supposed to use 'R CMD build
> xps'
> >> to create the tarball.  "Back in the day ..." straight tar cfz ...
> worked, it
> >> more or less stopped _many_ years ago.  Cf TheOneManualThatMatters:
> >>
> >>1.3.1 Checking packages
> >>---
> >>
> >>Using 'R CMD check', the R package checker, one can test whether
> >>_source_ R packages work correctly.  It can be run on one or more
> >>directories, or compressed package 'tar' archives with extension
> >>'.tar.gz', '.tgz', '.tar.bz2' or '.tar.xz'.
> >>
> >>   It is strongly recommended that the final checks are run on a
> 'tar'
> >>archive prepared by 'R CMD build'.
> >>
> >> Ie "It is strongly recommended ... 'tar' archive prepared by 'R CMD
> build'.
> >>
> >> Dirk
> >>
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> David Winsemius
> Alameda, CA, USA
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Memory limitations for parallel::mclapply

2015-07-24 Thread Ista Zahn

Hi Josh,

I think we need some more details, including code, and information
about your operating system. My machine has only 12 Gb of ram, but I
can run this quite comfortably (no swap, other processes using memory
etc.):

library(parallel)
library(data.table)
d - data.table(a = rnorm(5000),
b = runif(1:5000),
c = sample(letters, 5000, replace = TRUE),
d = 1:5000,
g = rep(letters[1:10], each = 500))

system.time(means - mclapply(unique(d$g), function(x) sapply(d[g==x,
list(a, b, d)], mean), mc.cores = 5))

In other words, I don't think there is anything inherent the the kind
of operation you describe that requires the large data object to be
copied. So as usual the devil is in the details, which you haven't yet
described.

Best,
Ista


On Fri, Jul 24, 2015 at 4:21 PM, Joshua Bradley jgbradl...@gmail.com wrote:
 Hello,

 I have been having issues using parallel::mclapply in a memory-efficient
 way and would like some guidance. I am using a 40 core machine with 96 GB
 of RAM. I've tried to run mclapply with 20, 30, and 40 mc.cores and it has
 practically brought the machine to a standstill each time to the point
 where I do a hard reset.

 When running mclapply with 10 mc.cores, I can see that each process takes
 7.4% (~7 GB) of memory. My use-case for mclapply is the following: run
 mclapply over a list of 15 names, for each process I refer to a larger
 pre-computed data.table to compute some stats with the name, and return
 those stats . Ideally I want to use the large data.table as shared-memory
 but the number of mc.cores I can use are being limited because each one
 requires 7 GB. Someone posted this exact same issue
 http://stackoverflow.com/questions/13942202/r-and-shared-memory-for-parallelmclapply
 on
 stackoverflow a couple years ago but it never got answered.

 Do I have to manually tell mclapply to use shared memory (if so, how?)? Is
 this type of job better with the doParallel package and foreach approach?

 Josh Bradley

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Unicode display problem with data frames under Windows

2015-05-25 Thread Ista Zahn

AFAIK this is the way it works on Windows. It has been discussed in several
places, e.g.
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
,
http://stackoverflow.com/questions/17715956/why-do-some-unicode-characters-display-in-matrices-but-not-data-frames-in-r
(both of these came up when I googled the subject line of your email).

Best,
Ista
On May 25, 2015 9:39 AM, Richard Cotton richiero...@gmail.com wrote:

 Here's a data frame with some Unicode symbols (set intersection and union).

 d - data.frame(x = A \u222a B \u2229 C)

 Printing this data frame under R 3.2.0 patched (r68378) and Windows 7, I
 see

 d
 ##  x
 ## 1 A U+222A B n C

 Printing the column itself works fine.

 d$x
 ## [1] A ∪ B ∩ C
 ## Levels: A ∪ B ∩ C

 The encoding is correctly UTF-8.

 Encoding(as.character(d$x))
 ## [1] UTF-8

 Under Linux both forms of printing are fine for me.

 I'm not quite sure whether I've missed a setting or if this is a bug, so

 Am I doing something silly?
 Can anyone else reproduce this?

 --
 Regards,
 Richie

 Learning R
 4dpiecharts.com

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] CRAN and ggplot2 geom and stat extensions

2014-12-23 Thread Ista Zahn

On Tue, Dec 23, 2014 at 10:34 AM, Frank Harrell
f.harr...@vanderbilt.edu wrote:
 I am thinking about adding several geom and stat extensions to ggplot2
 in the Hmisc package.  To do this requires using non-exported ggplot2
 functions as discussed in
 http://stackoverflow.com/questions/18108406/creating-a-custom-stat-object-in-ggplot2

 If I use the needed ggplot2::: notation the package will no longer pass
 CRAN checks.  Does anyone know of a solution?

the ggthemes package is on CRAN and uses ggplot2::: so it is at least
possible that this will be allowed for Hmisc as well.

Best,
Ista

  I'm assuming that Hadley
 doesn't want to export these functions or he would have done so a long
 time ago because of the number of users who have asked questions related
 to this.

 Frank
 --
 
 Frank E Harrell Jr  Professor and Chairman  School of Medicine

 Department of *Biostatistics*   *Vanderbilt University*


 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Is it a good idea or even possible to redefine attach?

2014-08-05 Thread Ista Zahn

On Tue, Aug 5, 2014 at 2:49 PM, Grant Rettke g...@wisdomandwonder.com wrote:
 Hi,

 Today I got curious about whether or not I /could/ remove `attach' from
 my system so:
 - Backed it up
 - Implemented a new one
 - Like this

 ,
 | attach.old - attach
 | attach - function(...) {stop(NEVER USE ATTACH)}
 `

Just masking it with your own function, e.g.,

attach - function(...) {stop(NEVER USE ATTACH)}

should be enough to discourage you from using it.


 I got the error:

 ,
 | Error: cannot change value of locked binding for 'attach'
 `

 If I unlock `attach' I assume that I could stomp on it... however is
 that even a good idea?

 What will I break?

Anything that uses attach.


 My goal was never to allow `attach' in my system, but it was just an
 idea.

 Kind regards,

 Grant Rettke | ACM, ASA, FSF, IEEE, SIAM
 g...@wisdomandwonder.com | http://www.wisdomandwonder.com/
 “Wisdom begins in wonder.” --Socrates
 ((λ (x) (x x)) (λ (x) (x x)))
 “Life has become immeasurably better since I have been forced to stop
 taking it seriously.” --Thompson

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] dget() much slower in recent R versions

2014-06-21 Thread Ista Zahn

Makes sense, thanks for the explanation.

Best,
Ista

On Sat, Jun 21, 2014 at 3:56 AM, Prof Brian Ripley
rip...@stats.ox.ac.uk wrote:
 On 20/06/2014 15:37, Ista Zahn wrote:

 Hello,

 I've noticed that dget() is much slower in the current and devel R
 versions than in previous versions. In 2.15 reading a 1-row
 data.frame takes less than half a second:

 (which.r - R.Version()$version.string)

 [1] R version 2.15.2 (2012-10-26)

 x - data.frame(matrix(sample(letters, 10, replace = TRUE), ncol =
 10))
 dput(x, which.r)
 system.time(y - dget(which.r))

 user  system elapsed
0.546   0.033   0.586

 While in 3.1.0 and r-devel it takes around 7 seconds.

 (which.r - R.Version()$version.string)

 [1] R version 3.1.0 (2014-04-10)

 x - data.frame(matrix(sample(letters, 10, replace = TRUE), ncol =
 10))
 dput(x, which.r)
 system.time(y - dget(which.r))

 user  system elapsed
6.920   0.060   7.074

 (which.r - R.Version()$version.string)

 [1] R Under development (unstable) (2014-06-19 r65979)

 x - data.frame(matrix(sample(letters, 10, replace = TRUE), ncol =
 10))
 dput(x, which.r)
 system.time(y - dget(which.r))

 user  system elapsed
6.886   0.047   6.943



 I know dput/dget is probably not the right tool for this job:
 nevertheless the slowdown in quite dramatic so I thought it was worth
 calling attention to.


 This is completely the wrong way to do this. See ?dump.

 dget() basically calls eval(parse()).  parse() is much slower in R = 3.0
 mainly because it keeps more information.  Using keep.source=FALSE here
 speeds things up a lot.


 system.time(y - dget(which.r))
user  system elapsed
   3.233   0.012   3.248
 options(keep.source=FALSE)

 system.time(y - dget(which.r))
user  system elapsed
   0.090   0.001   0.092


 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] dget() much slower in recent R versions

2014-06-20 Thread Ista Zahn

Hello,

I've noticed that dget() is much slower in the current and devel R
versions than in previous versions. In 2.15 reading a 1-row
data.frame takes less than half a second:

 (which.r - R.Version()$version.string)
[1] R version 2.15.2 (2012-10-26)
 x - data.frame(matrix(sample(letters, 10, replace = TRUE), ncol = 10))
 dput(x, which.r)
 system.time(y - dget(which.r))
   user  system elapsed
  0.546   0.033   0.586

While in 3.1.0 and r-devel it takes around 7 seconds.

 (which.r - R.Version()$version.string)
[1] R version 3.1.0 (2014-04-10)
 x - data.frame(matrix(sample(letters, 10, replace = TRUE), ncol = 10))
 dput(x, which.r)
 system.time(y - dget(which.r))
   user  system elapsed
  6.920   0.060   7.074

 (which.r - R.Version()$version.string)
[1] R Under development (unstable) (2014-06-19 r65979)
 x - data.frame(matrix(sample(letters, 10, replace = TRUE), ncol = 10))
 dput(x, which.r)
 system.time(y - dget(which.r))
   user  system elapsed
  6.886   0.047   6.943


I know dput/dget is probably not the right tool for this job:
nevertheless the slowdown in quite dramatic so I thought it was worth
calling attention to.

Best,
Ista

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] seq range argument

2014-02-03 Thread Ista Zahn

This is slightly more verbose, but perhaps

do.call(seq, as.list(c(extendrange(D_orig, f=0.1), len=100)))

Best,
Ista

On Mon, Feb 3, 2014 at 9:00 AM, Lorenz, David lor...@usgs.gov wrote:
 Berry,
   It sounds like you just need a little helper function like this:

 ser - function(x, len=100, f=0.1) {
dr - extendrange(x, f=f)
return(seq(dr[1L], dr[2L], length.out=len))
  }

 I called it ser, short for sequence extended range. Use it thusly:

 Ijustneed - ser(D_orig)

   Hope this helps. I use and create little help functions like this all the
 time. Even extendrange could be considered a helper function as it is only
 a couple of lines long.
 Dave



 On Mon, Feb 3, 2014 at 6:43 AM, Berry Boessenkool 
 berryboessenk...@hotmail.com wrote:

 Hello dear developers,

 I find myself often having the result of range oder extendrange, which
 I want to create a sequence with.
 But seq needs two seperate arguments from and two.
 Could an argument range be added?

 Otherwise I will have to create an object with the range (may come from a
 longer calculation), index twice from it and remove it again - kind of an
 unnecessary bunch of code, I think.

 Here's an example:

 # What I currently do:
 D_orig - rnorm(20, sd=30)
 D_range - extendrange(D_orig, f=0.1)
 Ijustneed - seq(D_range[1], D_range[2], len=100)
 rm(D_range)

 # what I'd like to have instead:
 D_orig - rnorm(20, sd=30)
 Ijustneed - seq(range=extendrange(D_orig, f=0.1), len=100)

 regards,
 Berry

 -
 Berry Boessenkool
 Potsdam, Germany
 -
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel


 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Failed to install kernlab package

2013-11-03 Thread Ista Zahn

Hi,

I can't reproduce the error with a fully updated CentOS 6 with R-core
and R-devel installed from
http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/epel/6/x86_64/.
Here is the sessionInfo from my CentOS 6 system where installation of
kernlab was successful:

sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.utf8   LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=CLC_NAME=C
 [9] LC_ADDRESS=C  LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

loaded via a namespace (and not attached):
[1] tcltk_3.0.1 tools_3.0.1

Best,
Ista

On Sun, Nov 3, 2013 at 4:15 PM, Lizkitty lizhen.p...@stonybrook.edu wrote:
 Hi everyone,

 I am trying to install kernlab package, but failed many times by now on
 CentOS 6 operating system. FYI, I have no problem with this package
 installation on windows platform.

 Here is the error message:

 trying URL 'http://cran.wustl.edu/src/contrib/kernlab_0.9-18.tar.gz'
 Content type 'application/x-gzip' length 1069148 bytes (1.0 Mb)
 opened URL
 ==
 downloaded 1.0 Mb

 * installing *source* package âkernlabâ ...
 ** package âkernlabâ successfully unpacked and MD5 sums checked
 ** libs
 g++ -I/n/sw/R-3.0.1_gcc-4.7.2/lib64/R/include -DNDEBUG  -I/usr/local/include
 -fpic  -g -O2  -c brweight.cpp -o brweight.o
 In file included from
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/bits/localefwd.h:42:0,
  from
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/ios:42,
  from
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/ostream:40,
  from
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/iostream:40,
  from errorcode.h:43,
  from brweight.h:43,
  from brweight.cpp:42:
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/c++locale.h:53:23:
 error: âuselocaleâ was not declared in this scope
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/c++locale.h:53:45:
 error: invalid type in declaration before â;â token
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/c++locale.h:
 In function âint std::__convert_from_v(__locale_struct* const, char*, int,
 const char*, ...)â:
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/c++locale.h:76:53:
 error: â__gnu_cxx::__uselocaleâ cannot be used as a function
 /n/sw/gcc-4.7.2/lib/gcc/x86_64-unknown-linux-gnu/4.7.2/../../../../include/c++/4.7.2/x86_64-unknown-linux-gnu/bits/c++locale.h:101:33:
 error: â__gnu_cxx::__uselocaleâ cannot be used as a function
 make: *** [brweight.o] Error 1
 ERROR: compilation failed for package âkernlabâ
 * removing â/n/home09/wang/R/x86_64-unknown-linux-gnu-library/3.0/kernlabâ

 The downloaded source packages are in
 â/scratch/tmp/Rtmpt52VjR/downloaded_packagesâ
 Warning message:
 In install.packages(kernlab) :
   installation of package âkernlabâ had non-zero exit status


 I found out online that many others also failed installing kernlab on linux,
 but no explicit unique solutions for this problem.
 What I have tried so far are:
 1) change cran mirrors
 2) using command line directly: R CMD INSTALL kernlab_0.9-18.tar.gz, after
 downloading the package
 3) reset working directory
 Unfortunately, none of these solved my problem...

 Any suggestions are highly appreciated!



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Failed-to-install-kernlab-package-tp4679652.html
 Sent from the R devel mailing list archive at Nabble.com.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Two R editiosn in Unix cluster systems

2013-10-15 Thread Ista Zahn

OpenMx does install on R 3.01. I haven't tested extensively, but after
installing with

install.packages('OpenMx',
 dependencies = TRUE,
 repos = c(getOption(repos),
'http://openmx.psyc.virginia.edu/sequential/'))

the demos appear to run correctly.

Best,
Ista

On Tue, Oct 15, 2013 at 4:15 PM, Paul Johnson pauljoh...@gmail.com wrote:
 Dear R Devel

 Some of our R users are still insisting we run R-2.15.3 because of
 difficulties with a package called OpenMX.  It can't cooperate with new R,
 oh well.

 Other users need to run R-3.0.1. I'm looking for the most direct route to
 install both, and allow users to choose at runtime.

 In the cluster, things run faster if I install RPMs to each node, rather
 than putting R itself on the NFS share (but I could do that if you think
 it's really better)

 In the past, I've used the SRPM packaging from EPEL repository to make a
 few little path changes and build R RPM for our cluster nodes. Now I face
 the problem of building 2 RPMS, one for R-2.15.3 and one for R-newest, and
 somehow keeping them separate.

 If you were me, how would you approach this?

 Here's my guess

 First, The RPM packages need unique names, of course.

 Second, leave the RPM packaging for R-newest exactly the same as it always
 was.  R is in the path, the R script and references among all the bits will
 be fine, no need to fight. It will find what it needs in /usr/lib64/R or
 whatnot.

 For the legacy R, I'm considering 2 ideas.  I could install R with the same
 prefix, /usr, but very careful so the R bits are installed into separate
 places. I just made a fresh build of R and on RedHat 6, it appears to me R
 installs these directories:
 bin
 libdir
 share.

 So what if the configure line has the magic bindir=/usr/bin-R-2.15.3
 libdir = /usr/lib64/R-2.15.3, and whatnot. If I were doing Debian
 packaging, I suppose I'd be obligated (by the file system standard) to do
 that kind of thing. But it looks like a headache.

 The easy road is to set the prefix at some out of the way place, like
 /opt/R-2.15.3, and then use a post-install script to link
 /opt/R-2/15.3/bin/R to /usr/bin/R-2.15.3.  When I tried that, it surprised
 me because R did not complain about lack access to devel headers. It
 configures and builds fine.

 R is now configured for x86_64-unknown-linux-gnu

   Source directory:  .
   Installation directory:/tmp/R

   C compiler:gcc -std=gnu99  -g -O2
   Fortran 77 compiler:   gfortran  -g -O2

   C++ compiler:  g++  -g -O2
   Fortran 90/95 compiler:gfortran -g -O2
   Obj-C compiler:gcc -g -O2 -fobjc-exceptions

   Interfaces supported:  X11, tcltk
   External libraries:readline, ICU, lzma
   Additional capabilities:   PNG, JPEG, TIFF, NLS, cairo
   Options enabled:   shared BLAS, R profiling, Java

   Recommended packages:  yes

 Should I worry about any runtime complications of this older R finding its
 of the newer R in the PATH ahead of it? I worry I'm making lazy
 assumptions?

 After that, I need to do some dancing with the RPM packaging.

 I suppose there'd be some comfort if I could get the users to define R_HOME
 in their user environment before launching jobs, I think that would
 eliminate the danger of confusion between versions, wouldn't it?

 pj
 --
 Paul E. Johnson
 Professor, Political Science  Assoc. Director
 1541 Lilac Lane, Room 504  Center for Research Methods
 University of Kansas University of Kansas
 http://pj.freefaculty.org   http://quant.ku.edu

 [[alternative HTML version deleted]]

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Problem with R 3.0.0

2013-08-21 Thread Ista Zahn

In case this is helpful, I don't see this issue on my Mac Pro with OSX
version 10.7.5. Details below.


 M - matrix(1,23171,23171) ; diag(M) - 0 ; range(colSums(M))
[1] 23170 23170

 sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-apple-darwin10.8.0 (64-bit)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
 .Platform
$OS.type
[1] unix

$file.sep
[1] /

$dynlib.ext
[1] .so

$GUI
[1] X11

$endian
[1] little

$pkgType
[1] mac.binary

$path.sep
[1] :

$r_arch
[1] 

 R.Version()
$platform
[1] x86_64-apple-darwin10.8.0

$arch
[1] x86_64

$os
[1] darwin10.8.0

$system
[1] x86_64, darwin10.8.0

$status
[1] 

$major
[1] 3

$minor
[1] 0.1

$year
[1] 2013

$month
[1] 05

$day
[1] 16

$`svn rev`
[1] 62743

$language
[1] R

$version.string
[1] R version 3.0.1 (2013-05-16)

$nickname
[1] Good Sport

On Wed, Aug 21, 2013 at 12:51 PM, peter dalgaard pda...@gmail.com wrote:

 On Aug 21, 2013, at 16:39 , peter dalgaard wrote:


 Likely. I'm not seeing it on the iMac/SnowLeopard, only on the MacPro/Lion. 
 I'm upgrading the MacPorts R on the MacPro now to see whether it has issues 
 too, but of course that reinstalls everything but the kitchen sink...

 Whoops. I don't know what I was thinking there. I seem to have suppressed all 
 memory of the hard disk replacement on the iMac, and its aftereffects. Both 
 machines are in fact running Lion! That makes things even odder...


 --
 Peter Dalgaard, Professor
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cbind error with check.names

2013-07-23 Thread Ista Zahn

On Tue, Jul 23, 2013 at 9:18 AM, Fg Nu fgn...@yahoo.com wrote:



 Here is an example where cbind fails with an error when check.names=TRUE is 
 set.

 data(airquality)
 airQualityBind =cbind(airquality,airquality,check.names =TRUE)


  I understand that cbind is a call to data.frame and the following works:
 airQualityBind =data.frame(airquality,airquality,check.names =TRUE)
 but I would like to understand why cbind throws an error.

 I asked this question on SO here:
 http://stackoverflow.com/questions/17810470/cbind-error-with-check-names
 and user Hong Ooi confirmed my suspicion that cbind was passing check.names = 
 FALSE regardless of my setting that option, even though the help file 
 indicates that this should be possible,

 For the data.frame method of cbind these can be further arguments to 
 data.frame such as stringsAsFactors.

 Is there some design principle that I am missing here?


Well, the function does work as documented. See the help file section
on Data frame methods, which says The 'cbind' data frame method is
just a wrapper for 'data.frame(..., check.names = FALSE)'.

Best,
Ista




 Thanks.

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] cbind error with check.names

2013-07-23 Thread Ista Zahn

On Tue, Jul 23, 2013 at 12:54 PM, Fg Nu fgn...@yahoo.com wrote:

 - Original Message -
 From: Ista Zahn istaz...@gmail.com
 To: Fg Nu fgn...@yahoo.com
 Cc: r-devel@r-project.org r-devel@r-project.org
 Sent: Tuesday, July 23, 2013 9:50 PM
 Subject: Re: [Rd] cbind error with check.names

 On Tue, Jul 23, 2013 at 9:18 AM, Fg Nu fgn...@yahoo.com wrote:

 Here is an example where cbind fails with an error when check.names=TRUE is 
 set.

 data(airquality)
 airQualityBind =cbind(airquality,airquality,check.names =TRUE)

  I understand that cbind is a call to data.frame and the following works:
 airQualityBind =data.frame(airquality,airquality,check.names =TRUE)
 but I would like to understand why cbind throws an error.

 I asked this question on SO here:
 http://stackoverflow.com/questions/17810470/cbind-error-with-check-names
 and user Hong Ooi confirmed my suspicion that cbind was passing check.names 
 = FALSE regardless of my setting that option, even though the help file 
 indicates that this should be possible,

 For the data.frame method of cbind these can be further arguments to 
 data.frame such as stringsAsFactors.

 Is there some design principle that I am missing here?

 Well, the function does work as documented. See the help file section
 on Data frame methods, which says The 'cbind' data frame method is
 just a wrapper for 'data.frame(..., check.names = FALSE)'.

 Best,
 Ista

 Is there then a reason that overriding the check.names default is forbidden 
 from cbind? I can't tell why this would be the case.

For the same reason you can't have

data.frame(x=1:10, x=11:20, check.names=TRUE, check.names=FALSE)

or

mean(x=1:10, x=11:20)

i.e, you can't generally pass the same argument more than once. There
are exceptions to this, e.g.,

sum(c(NA, 1:10), na.rm=TRUE, na.rm=FALSE)

but in general each argument can only be matched once. Since
cbind.data.frame calls data.frame with check.names=FALSE, you can't
supply it again.

Best,
Ista

 Thanks

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] stringsAsFactors

2013-02-13 Thread Ista Zahn

On Wed, Feb 13, 2013 at 7:33 AM, Michael Dewey i...@aghmed.fsnet.co.uk wrote:
 At 18:01 11/02/2013, Ista Zahn wrote:

 FWIW my view is that for data cleaning and organizing factors just get
 it the way. For modeling I like them because they make it easier to
 understand what is happening. For example I can look at the levels()
 to see what the reference group will be. With characters one has to
 know a) that levels are created in alphabetical order and b) the
 alphabetical order of the the unique values in the character vector.
 Ugh. So my habit is to turn off stringsAsFactors, then explicitly
 convert to factors before modeling (I also use factors to change the
 order in which things are displayed in tables and graphs, another
 place where converting to factors myself is useful but the creating
 them in alphabetical order by default is not)

 All this is to say that I would like options(stingsAsFactors=FALSE) to
 be the default, but I like the warning about converting to factors in
 modeling functions because it reminds me that I forgot to covert them,
 which I like to do anyway...


 I seem to be one of the few people who find the current default helpful.
 When I read in a dataset I am nearly always going to follow it with one or
 more of the modelling functions and so I do want to treat the categorical
 variables as factors. I cannot off-hand think of an example where I have had
 to convert them to characters.

Your data must reach you in a much better state than mine reaches me.
I spend most of my time organizing, combining, fixing typos,
reshaping, merging and so on. Then I see the dreaded warning

In `[-.factor`(`*tmp*`, 6, value = z) :
  invalid factor level, NAs generated

which reminds me that I've forgotten to set stringsAsFactors=FALSE.
However, I'm not saying I don't like factors. Once the data is cleaned
up they are very useful. But often I find that when I'm trying to
clean up a messy data set they just get in the way. And since that is
what I spend most of my time doing, factors get in the way most of the
time for me.



 Incidentally xkcd has, while this discussion has been going on, posted
 something relevant
 http://www.xkcd.com/1172/




 Best,
 Ista

 On Mon, Feb 11, 2013 at 12:50 PM, Duncan Murdoch
 murdoch.dun...@gmail.com wrote:
  On 11/02/2013 12:13 PM, William Dunlap wrote:
 
  Note that changing this does not just mean getting rid of silly
  warnings.
  Currently, predict.lm() can give wrong answers when stringsAsFactors is
  FALSE.
 
  d - data.frame(x=1:10, f=rep(c(A,B,C), c(4,3,3)), y=c(1:4,
  15:17, 28.1,28.8,30.1))
  fit_ab - lm(y ~ x + f, data = d, subset = f!=B)
 Warning message:
 In model.matrix.default(mt, mf, contrasts) :
   variable 'f' converted to a factor
  predict(fit_ab, newdata=d)
  1 2 3 4 5 6 7 8 9 10
  1  2  3  4 25 26 27  8  9 10
 Warning messages:
 1: In model.matrix.default(Terms, m, contrasts.arg =
  object$contrasts)
  :
   variable 'f' converted to a factor
 2: In predict.lm(fit_ab, newdata = d) :
   prediction from a rank-deficient fit may be misleading
 
  fit_ab is not rank-deficient and the predict should report
  1 2 3 4 NA NA NA 28 29 30
 
 
  In R-devel, the two warnings about factor conversions are no longer
  given,
  but the predictions are the same and the warning about rank deficiency
  still
  shows up.  If f is set to be a factor, an error is generated:
 
  Error in model.frame.default(Terms, newdata, na.action = na.action, xlev
  =
  object$xlevels) :
factor f has new levels B
 
  I think both the warning and error are somewhat reasonable responses.
  The
  fit is rank deficient relative to the model that includes f == B,
  because
  the column of the design matrix corresponding to f level B would be
  completely zero.  In this particular model, we could still do
  predictions
  for the other levels, but it also seems reasonable to quit, given that
  clearly something has gone wrong.
 
  I do think that it's unfortunate that we don't get the same result in
  both
  cases, and I'd like to have gotten the predictions you suggested, but I
  don't think that's going to happen.  The reason for the difference is
  that
  the subsetting is done before the conversion to a factor, but I think
  that
  is unavoidable without really big changes.
 
  Duncan Murdoch
 
 
 
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
   -Original Message-
   From: r-devel-boun...@r-project.org
   [mailto:r-devel-boun...@r-project.org] On Behalf
   Of Terry Therneau
   Sent: Monday, February 11, 2013 5:50 AM
   To: r-devel@r-project.org; Duncan Murdoch
   Subject: Re: [Rd] stringsAsFactors
  
   I think your idea to remove the warnings is excellent, and a good
   compromise.
   Characters
   already work fine in modeling functions except for the silly warning.
  
   It is interesting how often the defaults for a program reflect the
   data
   sets in use at the
   time the defaults

Re: [Rd] stringsAsFactors

2013-02-12 Thread Ista Zahn

FWIW my view is that for data cleaning and organizing factors just get
it the way. For modeling I like them because they make it easier to
understand what is happening. For example I can look at the levels()
to see what the reference group will be. With characters one has to
know a) that levels are created in alphabetical order and b) the
alphabetical order of the the unique values in the character vector.
Ugh. So my habit is to turn off stringsAsFactors, then explicitly
convert to factors before modeling (I also use factors to change the
order in which things are displayed in tables and graphs, another
place where converting to factors myself is useful but the creating
them in alphabetical order by default is not)

All this is to say that I would like options(stingsAsFactors=FALSE) to
be the default, but I like the warning about converting to factors in
modeling functions because it reminds me that I forgot to covert them,
which I like to do anyway...

Best,
Ista

On Mon, Feb 11, 2013 at 12:50 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 11/02/2013 12:13 PM, William Dunlap wrote:

 Note that changing this does not just mean getting rid of silly
 warnings.
 Currently, predict.lm() can give wrong answers when stringsAsFactors is
 FALSE.

 d - data.frame(x=1:10, f=rep(c(A,B,C), c(4,3,3)), y=c(1:4,
 15:17, 28.1,28.8,30.1))
 fit_ab - lm(y ~ x + f, data = d, subset = f!=B)
Warning message:
In model.matrix.default(mt, mf, contrasts) :
  variable 'f' converted to a factor
 predict(fit_ab, newdata=d)
 1 2 3 4 5 6 7 8 9 10
 1  2  3  4 25 26 27  8  9 10
Warning messages:
1: In model.matrix.default(Terms, m, contrasts.arg = object$contrasts)
 :
  variable 'f' converted to a factor
2: In predict.lm(fit_ab, newdata = d) :
  prediction from a rank-deficient fit may be misleading

 fit_ab is not rank-deficient and the predict should report
 1 2 3 4 NA NA NA 28 29 30


 In R-devel, the two warnings about factor conversions are no longer given,
 but the predictions are the same and the warning about rank deficiency still
 shows up.  If f is set to be a factor, an error is generated:

 Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
 object$xlevels) :
   factor f has new levels B

 I think both the warning and error are somewhat reasonable responses.  The
 fit is rank deficient relative to the model that includes f == B,  because
 the column of the design matrix corresponding to f level B would be
 completely zero.  In this particular model, we could still do predictions
 for the other levels, but it also seems reasonable to quit, given that
 clearly something has gone wrong.

 I do think that it's unfortunate that we don't get the same result in both
 cases, and I'd like to have gotten the predictions you suggested, but I
 don't think that's going to happen.  The reason for the difference is that
 the subsetting is done before the conversion to a factor, but I think that
 is unavoidable without really big changes.

 Duncan Murdoch




 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-devel-boun...@r-project.org
  [mailto:r-devel-boun...@r-project.org] On Behalf
  Of Terry Therneau
  Sent: Monday, February 11, 2013 5:50 AM
  To: r-devel@r-project.org; Duncan Murdoch
  Subject: Re: [Rd] stringsAsFactors
 
  I think your idea to remove the warnings is excellent, and a good
  compromise.
  Characters
  already work fine in modeling functions except for the silly warning.
 
  It is interesting how often the defaults for a program reflect the data
  sets in use at the
  time the defaults were chosen.  There are some such in my own survival
  package whose
  proper value is no longer as obvious as it was when I chose them.
  Factors are very
  handy for variables which have only a few levels and will be used in
  modeling.  Every
  character variable of every dataset in Statistical Models in S, which
  introduced
  factors, is of this type so auto-transformation made a lot of sense.
  The solder data
  set there is one for which Helmert contrasts are proper so guess what
  the default
  contrast
  option was?  (I think there are only a few data sets in the world for
  which Helmert makes
  sense, however, and R eventually changed the default.)
 
  For character variables that should not be factors such as a street
  adress
  stringsAsFactors can be a real PITA, and I expect that people's
  preference for the option
  depends almost entirely on how often these arise in their own work.  As
  long as there is
  an option that can be overridden I'm okay.  Yes, I'd prefer FALSE as the
  default, partly
  because the current value is a tripwire in the hallway that eventually
  catches every new
  user.
 
  Terry Therneau
 
  On 02/11/2013 05:00 AM, r-devel-requ...@r-project.org wrote:
   Both of these were discussed by R Core.  I think it's unlikely the
   default for stringsAsFactors will be

43 matches

Mail list logo