Re: [Rd] Making iconv portable?

2014-12-15 Thread Milan Bouchet-Valat
Le lundi 15 décembre 2014 à 13:49 -0500, Simon Urbanek a écrit :
> On Dec 15, 2014, at 1:37 PM, Spencer Graves  
> wrote:
> > 
> > 
> >> On Dec 15, 2014, at 10:13 AM, Simon Urbanek  
> >> wrote:
> >> 
> >>> 
> >>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
> >>> 
>  Spencer Graves writes:
> >>> 
>  Hello, All:  
> What would it take to make “iconv” portable?  
> >>> 
> >>> 
> I ask, because I want to convert accented characters to
> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
> “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
> on Linux and Mac.  It’s part of the “subNonStandardCharacters”
> function in the Ecfun package. The development version on
> R-Forge uses this and returns “Raul” under Windows and NA
> under Mac OS X (and presumably also Linux).
> >>> 
> >>> Hmm.
> >>> 
> >>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
> >>> [1] "Raul"
> >>> 
> >>> seems to work for me on Linux ...
> >>> 
> >> 
> >> also on OS X:
> >> 
> >>> iconv("Raúl", "", "ASCII//TRANSLIT")
> >> [1] “Ra'ul"
> > 
> > 
> >   Thanks for the replies.  I should have checked my examples more 
> > carefully.  Consider the following example and a slight modification from 
> > help(“iconv”):  
> > 
> > 
> > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > > Encoding(x) <- "latin1"
> > > x
> > [1] "Ekstrøm"   "Jöreskog"  "bißchen Zürcher"  
> > > iconv(x, "latin1", "ASCII//TRANSLIT")  # platform-dependent
> > [1] "Ekstrom""J\"oreskog" "bisschen Z\"urcher"
> > > 
> > > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > > x
> > [1] "Ekstr\xf8m""J\xf6reskog"   "bi\xdfchen Z\xfcrcher"
> > > iconv(x, "", "ASCII//TRANSLIT")  # platform-dependent
> > [1] NA NA NA 
> > 
> > 
> >   This suggests a two-step fix to my problem:  (1) Check Encoding(x) 
> > and set to “latin1” if it’s “unknown”.
> 
> Well, that depends heavily on your source. In the above it is hand-crafted 
> latin1 so if you don't declare it, the native encoding will be assumed - 
> which can be anything and has nothing to do with your actual input in this 
> particular case where you hand-constructed latin1.
> 
> 
> >  (2) Delete any new \” added by iconv.  
> > 
> 
> The whole point of translit is to create combinations of ASCII
> characters that represent the unicode characters, so " is just one
> many characters that can be used.
But it's quite unexpected that ö is transliterated to "o and ú to 'u.
Looks like iconv on OS X has a different idea of what ASCII
transliteration means than on Linux and Windows...

Anyway it's easy to remove " and ' if needed.


Regards

> Cheers,
> S
> 
> 
> > 
> >   Thanks again, 
> >   Spencer 
> > 
> >> 
> >> 
> >> 
> >>> -k
> >>> 
> >>> 
>    The “iconv” R code merely calls compiled code, which I’ve used very 
>  little in 30 years.   
> >>> 
> >>> 
> Thanks, 
> Spencer 
> >>> 
> >>> 
> >>> 
> > On Nov 30, 2014, at 2:32 AM, Spencer Graves 
> >  > > wrote:
> > 
> > Wonderful.  Thanks very much.  Spencer
> > 
> > 
> > On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
> >>> 
>   [[alternative HTML version deleted]]
> >>> 
>  __
>  R-devel@r-project.org mailing list
>  https://stat.ethz.ch/mailman/listinfo/r-devel
> >>> 
> >>> __
> >>> R-devel@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-devel
> > 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Building R on Windows: mkdir of Rtools creates directories with read-only permissions [WEIRD]

2014-12-15 Thread C
With reference to the issue first reported by Henrik Bengtsson (see
https://stat.ethz.ch/pipermail/r-devel/2014-January/068184.html), I would
like to report that I am experiencing the very same problem when building R
3.1.2 on Windows platform. Fortunately, the same workaround devised by
Henrik works in my case too.
I am using Windows 7 x64 with NTFS.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Simon Urbanek
On Dec 15, 2014, at 1:37 PM, Spencer Graves  wrote:
> 
> 
>> On Dec 15, 2014, at 10:13 AM, Simon Urbanek  
>> wrote:
>> 
>>> 
>>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
>>> 
 Spencer Graves writes:
>>> 
 Hello, All:  
  What would it take to make “iconv” portable?  
>>> 
>>> 
  I ask, because I want to convert accented characters to
  vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
  Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
  “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
  on Linux and Mac.  It’s part of the “subNonStandardCharacters”
  function in the Ecfun package. The development version on
  R-Forge uses this and returns “Raul” under Windows and NA
  under Mac OS X (and presumably also Linux).
>>> 
>>> Hmm.
>>> 
>>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
>>> [1] "Raul"
>>> 
>>> seems to work for me on Linux ...
>>> 
>> 
>> also on OS X:
>> 
>>> iconv("Raúl", "", "ASCII//TRANSLIT")
>> [1] “Ra'ul"
> 
> 
> Thanks for the replies.  I should have checked my examples more 
> carefully.  Consider the following example and a slight modification from 
> help(“iconv”):  
> 
> 
> > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > Encoding(x) <- "latin1"
> > x
> [1] "Ekstrøm"   "Jöreskog"  "bißchen Zürcher"  
> > iconv(x, "latin1", "ASCII//TRANSLIT")  # platform-dependent
> [1] "Ekstrom""J\"oreskog" "bisschen Z\"urcher"
> > 
> > x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> > x
> [1] "Ekstr\xf8m""J\xf6reskog"   "bi\xdfchen Z\xfcrcher"
> > iconv(x, "", "ASCII//TRANSLIT")  # platform-dependent
> [1] NA NA NA 
> 
> 
> This suggests a two-step fix to my problem:  (1) Check Encoding(x) 
> and set to “latin1” if it’s “unknown”.

Well, that depends heavily on your source. In the above it is hand-crafted 
latin1 so if you don't declare it, the native encoding will be assumed - which 
can be anything and has nothing to do with your actual input in this particular 
case where you hand-constructed latin1.


>  (2) Delete any new \” added by iconv.  
> 

The whole point of translit is to create combinations of ASCII characters that 
represent the unicode characters, so " is just one many characters that can be 
used.

Cheers,
S


> 
> Thanks again, 
> Spencer 
> 
>> 
>> 
>> 
>>> -k
>>> 
>>> 
 The “iconv” R code merely calls compiled code, which I’ve used very 
 little in 30 years.   
>>> 
>>> 
  Thanks, 
  Spencer 
>>> 
>>> 
>>> 
> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>  > wrote:
> 
> Wonderful.  Thanks very much.  Spencer
> 
> 
> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
>>> 
[[alternative HTML version deleted]]
>>> 
 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Spencer Graves

> On Dec 15, 2014, at 10:13 AM, Simon Urbanek  
> wrote:
> 
>> 
>> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
>> 
>>> Spencer Graves writes:
>> 
>>> Hello, All:  
>>>   What would it take to make “iconv” portable?  
>> 
>> 
>>>   I ask, because I want to convert accented characters to
>>>   vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
>>>   Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
>>>   “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
>>>   on Linux and Mac.  It’s part of the “subNonStandardCharacters”
>>>   function in the Ecfun package. The development version on
>>>   R-Forge uses this and returns “Raul” under Windows and NA
>>>   under Mac OS X (and presumably also Linux).
>> 
>> Hmm.
>> 
>> R> iconv("Raúl", "", "ASCII//TRANSLIT")
>> [1] "Raul"
>> 
>> seems to work for me on Linux ...
>> 
> 
> also on OS X:
> 
>> iconv("Raúl", "", "ASCII//TRANSLIT")
> [1] “Ra'ul"


  Thanks for the replies.  I should have checked my examples more 
carefully.  Consider the following example and a slight modification from 
help(“iconv”):  


> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> Encoding(x) <- "latin1"
> x
[1] "Ekstrøm"   "Jöreskog"  "bißchen Zürcher"  
> iconv(x, "latin1", "ASCII//TRANSLIT")  # platform-dependent
[1] "Ekstrom""J\"oreskog" "bisschen Z\"urcher"
> 
> x <- c("Ekstr\xf8m", "J\xf6reskog", "bi\xdfchen Z\xfcrcher")
> x
[1] "Ekstr\xf8m""J\xf6reskog"   "bi\xdfchen Z\xfcrcher"
> iconv(x, "", "ASCII//TRANSLIT")  # platform-dependent
[1] NA NA NA 


  This suggests a two-step fix to my problem:  (1) Check Encoding(x) 
and set to “latin1” if it’s “unknown”.  (2) Delete any new \” added by iconv.  


  Thanks again, 
  Spencer 

> 
> 
> 
>> -k
>> 
>> 
>>>  The “iconv” R code merely calls compiled code, which I’ve used very 
>>> little in 30 years.   
>> 
>> 
>>>   Thanks, 
>>>   Spencer 
>> 
>> 
>> 
 On Nov 30, 2014, at 2:32 AM, Spencer Graves 
 >>> > wrote:
 
 Wonderful.  Thanks very much.  Spencer
 
 
 On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
>> 
>>> [[alternative HTML version deleted]]
>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> __
>> R-devel@r-project.org  mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel 
>> 

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Simon Urbanek

> On Dec 15, 2014, at 12:21 PM, Kurt Hornik  wrote:
> 
>> Spencer Graves writes:
> 
>> Hello, All:  
>>What would it take to make “iconv” portable?  
> 
> 
>>I ask, because I want to convert accented characters to
>>vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
>>Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
>>“", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
>>on Linux and Mac.  It’s part of the “subNonStandardCharacters”
>>function in the Ecfun package. The development version on
>>R-Forge uses this and returns “Raul” under Windows and NA
>>under Mac OS X (and presumably also Linux).
> 
> Hmm.
> 
> R> iconv("Raúl", "", "ASCII//TRANSLIT")
> [1] "Raul"
> 
> seems to work for me on Linux ...
> 

also on OS X:

> iconv("Raúl", "", "ASCII//TRANSLIT")
[1] "Ra'ul"



> -k
> 
> 
>>   The “iconv” R code merely calls compiled code, which I’ve used very 
>> little in 30 years.   
> 
> 
>>Thanks, 
>>Spencer 
> 
> 
> 
>>> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>>> >> > wrote:
>>> 
>>> Wonderful.  Thanks very much.  Spencer
>>> 
>>> 
>>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:
> 
>>  [[alternative HTML version deleted]]
> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Making iconv portable?

2014-12-15 Thread Kurt Hornik
> Spencer Graves writes:

> Hello, All:  
> What would it take to make “iconv” portable?  


> I ask, because I want to convert accented characters to
> vanilla ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and
> Milan Bouchet-Valet suggested on R-help that I use 'iconv(x,
> “", "ASCII//TRANSLIT”)’.  This worked under Windows but failed
> on Linux and Mac.  It’s part of the “subNonStandardCharacters”
> function in the Ecfun package. The development version on
> R-Forge uses this and returns “Raul” under Windows and NA
> under Mac OS X (and presumably also Linux).

Hmm.

R> iconv("Raúl", "", "ASCII//TRANSLIT")
[1] "Raul"

seems to work for me on Linux ...

-k


>The “iconv” R code merely calls compiled code, which I’ve used very 
> little in 30 years.   


> Thanks, 
> Spencer 



>> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>> > > wrote:
>> 
>> Wonderful.  Thanks very much.  Spencer
>> 
>> 
>> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:

>   [[alternative HTML version deleted]]

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Making iconv portable?

2014-12-15 Thread Spencer Graves
Hello, All:  


  What would it take to make “iconv” portable?  


  I ask, because I want to convert accented characters to vanilla 
ASCII, thereby converting, e.g., ‘Raúl’ to “Raul”, and Milan Bouchet-Valet 
suggested on R-help that I use 'iconv(x, “",  "ASCII//TRANSLIT”)’.  This worked 
under Windows but failed on Linux and Mac.  It’s part of the 
“subNonStandardCharacters” function in the Ecfun package. The development 
version on R-Forge uses this and returns “Raul” under Windows and NA under Mac 
OS X (and presumably also Linux).


 The “iconv” R code merely calls compiled code, which I’ve used very 
little in 30 years.   


  Thanks, 
  Spencer 



> On Nov 30, 2014, at 2:32 AM, Spencer Graves 
>  > wrote:
> 
> Wonderful.  Thanks very much.  Spencer
> 
> 
> On 11/30/2014 2:25 AM, Milan Bouchet-Valat wrote:

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Significant memory leak when using XML on Windows

2014-12-15 Thread Janko Thyson
@Jeroen: nope, seems like the problem unfortunately persists:

require("XML")
getTaskMemoryByPid <- function(
  pid = Sys.getpid()
) {
  cmd <- sprintf("tasklist /FI \"pid eq %s\" /FO csv", pid)
  mem <- read.csv(text=shell(cmd, intern = TRUE),
stringsAsFactors=FALSE)[,5]
  mem <- as.numeric(gsub("\\.|\\s|K", "", mem))/1000
  mem
}
getCurrentMemoryStatus <- function() {
  mem_os  <- getTaskMemoryByPid()
  mem_r   <- memory.size()
  prof_1  <- memory.profile()
  list(r = mem_r, os = mem_os, ratio = mem_os/mem_r)
}
memoryLeak <- function(
  x = system.file("exampleData", "mtcars.xml", package="XML"),
  n = 5000,
  free_doc = FALSE,
  rm_doc = FALSE,
  use_gc = FALSE
) {
  lapply(1:n, function(ii) {
doc <- xmlParse(x)
if (free_doc) free(doc)
if (rm_doc) rm(doc)
if (use_gc) gc()
NULL
  })
}
mem_1 <- getCurrentMemoryStatus()
memoryLeak(n = 5, free_doc = TRUE, rm_doc = TRUE)
mem_2 <- getCurrentMemoryStatus()

> rbind(data.frame(mem_1), data.frame(mem_2))

  r  osratio
1 63.65  87.148 1.369175
2 97.63 122.160 1.251255



On Mon, Dec 15, 2014 at 12:25 PM, Janko Thyson 
wrote:
>
> Sorry guys, didn't see your responses before sending mine.
>
> Thanks jeroen!! I'll test your version today and get back to you.
>
> Gesendet von meinem Smartphone
> Am 15.12.2014 12:12 schrieb "Janko Thyson" :
>
> > Thanks a lot for answering. Before I get into it, please note that
> > everything below bears the big capture "Thanks for trying to help me at
> > all".
> >
> > 1) Yeah, those examples - quite hard to satisfy everyone's needs ;-)
> While
> > the one side complained that my past examples regarding this issue were
> not
> > informative enough, others didn't like the more elaborated version (as
> > seems to be the case for you). I simply tried to make it as easy as
> > possible for people to see what's actually going on so they wouldn't have
> > to program their own stuff for things like reading the actual memory
> > consumed by the Rterm process etc.. If you prefer plain vanilla, though,
> I
> > guess this would be it:
> >
> > memoryLeak <- function(
> >   x = system.file("exampleData", "mtcars.xml", package="XML"),
> >   n = 5000,
> >   free_doc = FALSE,
> >   rm_doc = FALSE,
> >   use_gc = FALSE
> > ) {
> >   lapply(1:n, function(ii) {
> > doc <- xmlParse(x)
> > if (free_doc) free(doc)
> > if (rm_doc) rm(doc)
> > if (use_gc) gc()
> > NULL
> >   })
> > }
> >
> > 2) If I knew my way around OSX or Linux, I would be happy to go with your
> > suggestions - but as I'm not, unfortunately that's out of reach for me.
> But
> > IMO, a deeper level of cross-platform expertise should **not** be a
> > generall prerequisite before you can ask for help - even at r-devel (as
> > opposed to r-help). However, AFAIK from past conversations with Duncan,
> the
> > problem is indeed Windows-specific as on all his non-Windows
> infrastructure
> > (definitely Linux, possibly OSX), everything went fine.
> >
> > 3) The same goes for the level of expertise in C. After all, R is not C.
> I
> > totally agree that the more programming languages one knows, the better.
> > But again: I don't think that knowing your way around C should be a
> > prerequisite for asking for help when an *R function* interfacing C
> causes
> > trouble. Requesting this would sort of oppose R's nature/paradigm of
> being
> > an awesome "top-level" interfacing language. But I'll try to narrow the
> > problem down on a C-level if I can help you with that.
> >
> > 4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
> > causing the problem. So trying to link against another build would
> possibly
> > be a great way to start! How would I go about that?
> >
> > Thanks if you should take the time to further look into this!
> > Janko
> >
> > On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms 
> wrote:
> >>
> >> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson 
> >> wrote:
> >>>
> >>> I'd so much appreciate if someone could have a look at this. If I can
> be
> >>> of
> >>> any help whatsoever, please let me know!
> >>>
> >>
> >> Your current code uses various functions from XML and rvest so it is not
> >> a *minimal* reproducible example. Even if you are unfamiliar with C, you
> >> should be able to investigate exactly which function in the XML package
> you
> >> think has issues. Once you found the problematic R function, inspect the
> >> source code or use debug() to see if you can narrow it down even
> further,
> >> preferably to a particular call to C.
> >>
> >> Moreover you should create a reproducible example that allows us (and
> >> you) to test if this problem appears on other systems such as OSX or
> linux.
> >> Development and debugging on Windows is very painful so your
> windows-only
> >> example is not too helpful. Making people use windows is not a good
> >> strategy for getting help.
> >>
> >> If the "leak" does not appear on other systems, it is likely a problem
> in
> >> the libxml2 windows library on cran

Re: [Rd] Significant memory leak when using XML on Windows

2014-12-15 Thread Janko Thyson
Sorry guys, didn't see your responses before sending mine.

Thanks jeroen!! I'll test your version today and get back to you.

Gesendet von meinem Smartphone
Am 15.12.2014 12:12 schrieb "Janko Thyson" :

> Thanks a lot for answering. Before I get into it, please note that
> everything below bears the big capture "Thanks for trying to help me at
> all".
>
> 1) Yeah, those examples - quite hard to satisfy everyone's needs ;-) While
> the one side complained that my past examples regarding this issue were not
> informative enough, others didn't like the more elaborated version (as
> seems to be the case for you). I simply tried to make it as easy as
> possible for people to see what's actually going on so they wouldn't have
> to program their own stuff for things like reading the actual memory
> consumed by the Rterm process etc.. If you prefer plain vanilla, though, I
> guess this would be it:
>
> memoryLeak <- function(
>   x = system.file("exampleData", "mtcars.xml", package="XML"),
>   n = 5000,
>   free_doc = FALSE,
>   rm_doc = FALSE,
>   use_gc = FALSE
> ) {
>   lapply(1:n, function(ii) {
> doc <- xmlParse(x)
> if (free_doc) free(doc)
> if (rm_doc) rm(doc)
> if (use_gc) gc()
> NULL
>   })
> }
>
> 2) If I knew my way around OSX or Linux, I would be happy to go with your
> suggestions - but as I'm not, unfortunately that's out of reach for me. But
> IMO, a deeper level of cross-platform expertise should **not** be a
> generall prerequisite before you can ask for help - even at r-devel (as
> opposed to r-help). However, AFAIK from past conversations with Duncan, the
> problem is indeed Windows-specific as on all his non-Windows infrastructure
> (definitely Linux, possibly OSX), everything went fine.
>
> 3) The same goes for the level of expertise in C. After all, R is not C. I
> totally agree that the more programming languages one knows, the better.
> But again: I don't think that knowing your way around C should be a
> prerequisite for asking for help when an *R function* interfacing C causes
> trouble. Requesting this would sort of oppose R's nature/paradigm of being
> an awesome "top-level" interfacing language. But I'll try to narrow the
> problem down on a C-level if I can help you with that.
>
> 4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
> causing the problem. So trying to link against another build would possibly
> be a great way to start! How would I go about that?
>
> Thanks if you should take the time to further look into this!
> Janko
>
> On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms  wrote:
>>
>> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson 
>> wrote:
>>>
>>> I'd so much appreciate if someone could have a look at this. If I can be
>>> of
>>> any help whatsoever, please let me know!
>>>
>>
>> Your current code uses various functions from XML and rvest so it is not
>> a *minimal* reproducible example. Even if you are unfamiliar with C, you
>> should be able to investigate exactly which function in the XML package you
>> think has issues. Once you found the problematic R function, inspect the
>> source code or use debug() to see if you can narrow it down even further,
>> preferably to a particular call to C.
>>
>> Moreover you should create a reproducible example that allows us (and
>> you) to test if this problem appears on other systems such as OSX or linux.
>> Development and debugging on Windows is very painful so your windows-only
>> example is not too helpful. Making people use windows is not a good
>> strategy for getting help.
>>
>> If the "leak" does not appear on other systems, it is likely a problem in
>> the libxml2 windows library on cran. In that case we can try to link
>> against another build. On the other hand, if the problem does appear across
>> systems, and you have provided a minimal reproducible example that
>> pinpoints the problematic C function, we can help you review/debug the code
>> C to see if/where some allocated object is not properly freed.
>>
>>
>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Significant memory leak when using XML on Windows

2014-12-15 Thread Janko Thyson
Sorry guys, didn't see your responses before sending mine.

Thanks jeroen!! I'll test your version today and get back to you.

Gesendet von meinem Smartphone
Am 15.12.2014 12:12 schrieb "Janko Thyson" :

> Thanks a lot for answering. Before I get into it, please note that
> everything below bears the big capture "Thanks for trying to help me at
> all".
>
> 1) Yeah, those examples - quite hard to satisfy everyone's needs ;-) While
> the one side complained that my past examples regarding this issue were not
> informative enough, others didn't like the more elaborated version (as
> seems to be the case for you). I simply tried to make it as easy as
> possible for people to see what's actually going on so they wouldn't have
> to program their own stuff for things like reading the actual memory
> consumed by the Rterm process etc.. If you prefer plain vanilla, though, I
> guess this would be it:
>
> memoryLeak <- function(
>   x = system.file("exampleData", "mtcars.xml", package="XML"),
>   n = 5000,
>   free_doc = FALSE,
>   rm_doc = FALSE,
>   use_gc = FALSE
> ) {
>   lapply(1:n, function(ii) {
> doc <- xmlParse(x)
> if (free_doc) free(doc)
> if (rm_doc) rm(doc)
> if (use_gc) gc()
> NULL
>   })
> }
>
> 2) If I knew my way around OSX or Linux, I would be happy to go with your
> suggestions - but as I'm not, unfortunately that's out of reach for me. But
> IMO, a deeper level of cross-platform expertise should **not** be a
> generall prerequisite before you can ask for help - even at r-devel (as
> opposed to r-help). However, AFAIK from past conversations with Duncan, the
> problem is indeed Windows-specific as on all his non-Windows infrastructure
> (definitely Linux, possibly OSX), everything went fine.
>
> 3) The same goes for the level of expertise in C. After all, R is not C. I
> totally agree that the more programming languages one knows, the better.
> But again: I don't think that knowing your way around C should be a
> prerequisite for asking for help when an *R function* interfacing C causes
> trouble. Requesting this would sort of oppose R's nature/paradigm of being
> an awesome "top-level" interfacing language. But I'll try to narrow the
> problem down on a C-level if I can help you with that.
>
> 4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
> causing the problem. So trying to link against another build would possibly
> be a great way to start! How would I go about that?
>
> Thanks if you should take the time to further look into this!
> Janko
>
> On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms  wrote:
>>
>> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson 
>> wrote:
>>>
>>> I'd so much appreciate if someone could have a look at this. If I can be
>>> of
>>> any help whatsoever, please let me know!
>>>
>>
>> Your current code uses various functions from XML and rvest so it is not
>> a *minimal* reproducible example. Even if you are unfamiliar with C, you
>> should be able to investigate exactly which function in the XML package you
>> think has issues. Once you found the problematic R function, inspect the
>> source code or use debug() to see if you can narrow it down even further,
>> preferably to a particular call to C.
>>
>> Moreover you should create a reproducible example that allows us (and
>> you) to test if this problem appears on other systems such as OSX or linux.
>> Development and debugging on Windows is very painful so your windows-only
>> example is not too helpful. Making people use windows is not a good
>> strategy for getting help.
>>
>> If the "leak" does not appear on other systems, it is likely a problem in
>> the libxml2 windows library on cran. In that case we can try to link
>> against another build. On the other hand, if the problem does appear across
>> systems, and you have provided a minimal reproducible example that
>> pinpoints the problematic C function, we can help you review/debug the code
>> C to see if/where some allocated object is not properly freed.
>>
>>
>>
>>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Significant memory leak when using XML on Windows

2014-12-15 Thread Janko Thyson
Thanks a lot for answering. Before I get into it, please note that
everything below bears the big capture "Thanks for trying to help me at
all".

1) Yeah, those examples - quite hard to satisfy everyone's needs ;-) While
the one side complained that my past examples regarding this issue were not
informative enough, others didn't like the more elaborated version (as
seems to be the case for you). I simply tried to make it as easy as
possible for people to see what's actually going on so they wouldn't have
to program their own stuff for things like reading the actual memory
consumed by the Rterm process etc.. If you prefer plain vanilla, though, I
guess this would be it:

memoryLeak <- function(
  x = system.file("exampleData", "mtcars.xml", package="XML"),
  n = 5000,
  free_doc = FALSE,
  rm_doc = FALSE,
  use_gc = FALSE
) {
  lapply(1:n, function(ii) {
doc <- xmlParse(x)
if (free_doc) free(doc)
if (rm_doc) rm(doc)
if (use_gc) gc()
NULL
  })
}

2) If I knew my way around OSX or Linux, I would be happy to go with your
suggestions - but as I'm not, unfortunately that's out of reach for me. But
IMO, a deeper level of cross-platform expertise should **not** be a
generall prerequisite before you can ask for help - even at r-devel (as
opposed to r-help). However, AFAIK from past conversations with Duncan, the
problem is indeed Windows-specific as on all his non-Windows infrastructure
(definitely Linux, possibly OSX), everything went fine.

3) The same goes for the level of expertise in C. After all, R is not C. I
totally agree that the more programming languages one knows, the better.
But again: I don't think that knowing your way around C should be a
prerequisite for asking for help when an *R function* interfacing C causes
trouble. Requesting this would sort of oppose R's nature/paradigm of being
an awesome "top-level" interfacing language. But I'll try to narrow the
problem down on a C-level if I can help you with that.

4) Both Duncan as well as Hadley have suggested that libxml2 is indeed
causing the problem. So trying to link against another build would possibly
be a great way to start! How would I go about that?

Thanks if you should take the time to further look into this!
Janko

On Mon, Dec 15, 2014 at 4:54 AM, Jeroen Ooms  wrote:
>
> On Thu, Dec 11, 2014 at 12:13 PM, Janko Thyson 
> wrote:
>>
>> I'd so much appreciate if someone could have a look at this. If I can be
>> of
>> any help whatsoever, please let me know!
>>
>
> Your current code uses various functions from XML and rvest so it is not a
> *minimal* reproducible example. Even if you are unfamiliar with C, you
> should be able to investigate exactly which function in the XML package you
> think has issues. Once you found the problematic R function, inspect the
> source code or use debug() to see if you can narrow it down even further,
> preferably to a particular call to C.
>
> Moreover you should create a reproducible example that allows us (and you)
> to test if this problem appears on other systems such as OSX or linux.
> Development and debugging on Windows is very painful so your windows-only
> example is not too helpful. Making people use windows is not a good
> strategy for getting help.
>
> If the "leak" does not appear on other systems, it is likely a problem in
> the libxml2 windows library on cran. In that case we can try to link
> against another build. On the other hand, if the problem does appear across
> systems, and you have provided a minimal reproducible example that
> pinpoints the problematic C function, we can help you review/debug the code
> C to see if/where some allocated object is not properly freed.
>
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel