Re: Possible workaround (was: Re: Help: r-cran-treescape does not build on i386, armel and armhf any more)

2016-12-22 Thread Andreas Tille
severity 845753 important   

  
thanks  

  

Hi Christian,

On Wed, Dec 14, 2016 at 03:37:27PM +0100, Christian Seiler wrote:
> 
> Anyway, to work around this for now, you can replace your
> dh_auto_install line (that is passed to the xvfb call)
> with the following command:
> 
>   /bin/sh -c "ulimit -S -s 20 ; exec dh_auto_install"
> 
> Just tried it, sbuild built the package successfully on
> i386. I haven't tried armhf, but I suspect the result will
> be the same.

Since I was lacking any better idea I used this workaround and at least
downgrade the severity of the bug hereby from serious to important.  I
can confirm that all affected architectures are building now.  It might
be important to mention that after a first upload also mips64el failed
which was not affected by the problem before.  Thus in addition to the
unclear failure it seems that some randomness is involved. :-(

I have implemented the workaround only for this version.  It needs to be
explicitly restored for any new upstream version thus we will notice if
the problem persists for any later upstream versions.

In any case thanks a lot for Christian to spent this effort on the
problem

  Andreas.

-- 
http://fam-tille.de



Re: Bug#845753: Possible workaround

2016-12-15 Thread Andreas Tille
On Thu, Dec 15, 2016 at 07:37:31AM -0600, Dirk Eddelbuettel wrote:
> 
> On 15 December 2016 at 14:26, Andreas Tille wrote:
> | This was discussed before.  The output above is from a previous package
> | version where I simply forgot to actually use xvfb.  Since this error
> | of mine the package was build without RGL - thus the warning.  Later I
> | was using xvfb correctly including OpenGL (thanks to Gregor Hermann who
> | pointed me to the solution).  However, the segfault remained.
> 
> If there is a (new?) trick, can you share it?

You mean how to build with OpenGL?  See here:

   
https://anonscm.debian.org/viewvc/debian-med/trunk/packages/R/r-cran-treescape/trunk/debian/rules?revision=23285&view=markup

Kind regards

   Andreas. 

-- 
http://fam-tille.de



Re: Possible workaround

2016-12-15 Thread Christian Seiler
On 12/15/2016 03:03 PM, Dirk Eddelbuettel wrote:
> 
> On 15 December 2016 at 14:42, Christian Seiler wrote:
> | On 12/15/2016 02:37 PM, Dirk Eddelbuettel wrote:
> | > On 15 December 2016 at 14:26, Andreas Tille wrote:
> | > | Sorry, but I have no idea how since I'm totally clueless currently and
> | > | upstream also did not yet responded to this after the initial idea that
> | > | it might be some ape related issue was not helpful.  Do you in turn see
> | > | any chance to push this question to the right forum in the R community?
> | > 
> | > Not really. All (well, most) builds at their are fine [1]. They would 
> likely suggest that
> | > we sort our (local to them) issues out at our end.
> | > 
> | > How to run R with gdb is discussed iin Writing R Extensions.  Maybe we 
> need
> | > to start with some stacktraces to see who calls whom how.
> | 
> | I had already posted a gdb backtrace here:
> | https://lists.debian.org/debian-mentors/2016/12/msg00412.html
> | 
> | Any idea how to get the corresponding R backtrace from this?
> | 
> | (R's own debug() will obviously not work if there's a C stack
> | overflow.)
> 
> Use
> 
>   R -d gdb [other options you may use]
> 
> which is describe in the manual I referenced earlier:
> https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Debugging-compiled-code

Then the error doesn't occur, unfortunately.

If I run R -d gdb and then do the action manually (by calling
the corresponding R function), then everything works, even with the
lower stack limit. (I mentioned this in an earlier email.)

If I run the R command directly, and attach gdb while it's still
running (luckily it takes a couple of seconds), then the error
occurs, but I get a horrible stack trace.

I assume R -d gdb starts gdb with some initialization file - can I
load that into gdb manually? If so, where can I find that file?

Regards,
Christian



Re: Possible workaround

2016-12-15 Thread Dirk Eddelbuettel

On 15 December 2016 at 14:42, Christian Seiler wrote:
| On 12/15/2016 02:37 PM, Dirk Eddelbuettel wrote:
| > On 15 December 2016 at 14:26, Andreas Tille wrote:
| > | Sorry, but I have no idea how since I'm totally clueless currently and
| > | upstream also did not yet responded to this after the initial idea that
| > | it might be some ape related issue was not helpful.  Do you in turn see
| > | any chance to push this question to the right forum in the R community?
| > 
| > Not really. All (well, most) builds at their are fine [1]. They would 
likely suggest that
| > we sort our (local to them) issues out at our end.
| > 
| > How to run R with gdb is discussed iin Writing R Extensions.  Maybe we need
| > to start with some stacktraces to see who calls whom how.
| 
| I had already posted a gdb backtrace here:
| https://lists.debian.org/debian-mentors/2016/12/msg00412.html
| 
| Any idea how to get the corresponding R backtrace from this?
| 
| (R's own debug() will obviously not work if there's a C stack
| overflow.)

Use

  R -d gdb [other options you may use]

which is describe in the manual I referenced earlier:
https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Debugging-compiled-code

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Re: Possible workaround

2016-12-15 Thread Dirk Eddelbuettel

On 15 December 2016 at 14:26, Andreas Tille wrote:
| This was discussed before.  The output above is from a previous package
| version where I simply forgot to actually use xvfb.  Since this error
| of mine the package was build without RGL - thus the warning.  Later I
| was using xvfb correctly including OpenGL (thanks to Gregor Hermann who
| pointed me to the solution).  However, the segfault remained.

If there is a (new?) trick, can you share it?  Maybe even test in r-cran.mk
so that it progagates?  The rgl package (ie r-cran-rgl) appears in other
builds too so I would probably incorporate this.
  
| Sorry, but I have no idea how since I'm totally clueless currently and
| upstream also did not yet responded to this after the initial idea that
| it might be some ape related issue was not helpful.  Do you in turn see
| any chance to push this question to the right forum in the R community?

Not really. All (well, most) builds at their are fine [1]. They would likely 
suggest that
we sort our (local to them) issues out at our end.

How to run R with gdb is discussed iin Writing R Extensions.  Maybe we need
to start with some stacktraces to see who calls whom how.

Dirk

[1] https://cloud.r-project.org/web/checks/check_results_treescape.html
Solaris and OS X having warnings is not uncommon.
-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Re: Possible workaround

2016-12-15 Thread Christian Seiler
On 12/15/2016 02:37 PM, Dirk Eddelbuettel wrote:
> On 15 December 2016 at 14:26, Andreas Tille wrote:
> | Sorry, but I have no idea how since I'm totally clueless currently and
> | upstream also did not yet responded to this after the initial idea that
> | it might be some ape related issue was not helpful.  Do you in turn see
> | any chance to push this question to the right forum in the R community?
> 
> Not really. All (well, most) builds at their are fine [1]. They would likely 
> suggest that
> we sort our (local to them) issues out at our end.
> 
> How to run R with gdb is discussed iin Writing R Extensions.  Maybe we need
> to start with some stacktraces to see who calls whom how.

I had already posted a gdb backtrace here:
https://lists.debian.org/debian-mentors/2016/12/msg00412.html

Any idea how to get the corresponding R backtrace from this?

(R's own debug() will obviously not work if there's a C stack
overflow.)

Regards,
Christian



Re: Possible workaround

2016-12-15 Thread Andreas Tille
Hi,

On Wed, Dec 14, 2016 at 03:57:13PM -0600, Dirk Eddelbuettel wrote:
> |   RGL: unable to open X11 display
> | Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
> | Error: segfault from C stack overflow
> | * removing 
> '/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape'
> | 
> | (Ignore the X warnings, they are irrelevant here, I'm too lazy to run
> | it in xvfb and it's in a VM without X.)
> 
> xvfb does not given you OpenGL. This wants X11 _and OpenGL_.

This was discussed before.  The output above is from a previous package
version where I simply forgot to actually use xvfb.  Since this error
of mine the package was build without RGL - thus the warning.  Later I
was using xvfb correctly including OpenGL (thanks to Gregor Hermann who
pointed me to the solution).  However, the segfault remained.
 
> Because of that, I think it would be worthwhile to check if it were to build
> without the need for rgl.

As said above this was the initial state (unintentionally) - or do you
have something else in mind?
 
> Else the '--no-test-load' added to the 'R CMD INSTALL' call should skip all
> that.  We would need a local copy of r-cran.mk, or just copy it and adjust.
> 
> Maybe Andreas can give you a hand?

Sorry, but I have no idea how since I'm totally clueless currently and
upstream also did not yet responded to this after the initial idea that
it might be some ape related issue was not helpful.  Do you in turn see
any chance to push this question to the right forum in the R community?

I do not think that hiding our eyes from a potential problem by simply
increasing the stack size to an insane value is a proper way to solve
the issue - but currently there is no better idea on the table.
 
> | btw., since the program uses a huge stack, but there is no system
> | call related stuff going on (and the package doesn't appear to
> | interface directly with the kernel anyway, it's much too high level
> | for that), and the problem persists for the given kernel over a lot
> | of very different architectures.
> 
> It sounds like a bug at the system level. R packages tend not to be that
> greedy at build or load.

I admit that it is the first time that I'm observing an issue like this
in lots of R packages I'm maintaining.  However, there must be some
deeper reason for the failure and I personally have no idea how to track
this down.  I'd welcome if you can propagate the issue or tell me the
right forum to bring this up.
 
Kind regards

 Andreas.

-- 
http://fam-tille.de



Re: Possible workaround

2016-12-14 Thread Dirk Eddelbuettel

On 14 December 2016 at 16:44, Christian Seiler wrote:
| Hi,
| 
| On 12/14/2016 04:16 PM, Dirk Eddelbuettel wrote:
| > One quick thought: does it die in _compilation_ which we have seen with 
other
| > (C++-heavy) packages?
| 
| No, g++ works fine here. (The C++ file itself is trivial if you
| look at it.)
| 
| Current package in Debian:
| http://sources.debian.net/src/r-cran-treescape/1.10.18-2/
| 
| > Otherwise if it fails _after_ compilation we may be able to get by turning
| > some default aspects of R CMD INSTALL off:
| > 
| >   --no-byte-compile do not byte-compile R code
| 
| That doesn't help, still fails. :-(
| 
| >   --no-test-loadskip test of loading installed package
| 
| That doesn't help either. :-(
| 
| >From the build log when it fails (8 MiB stack limit):
| 
| * installing *source* package 'treescape' ...
| ** package 'treescape' successfully unpacked and MD5 sums checked
| ** libs
| g++ -I/usr/share/R/include -DNDEBUG   
-I"/usr/lib/R/site-library/Rcpp/include"   -fpic  -g -O2 
-fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. -fstack-protector-strong 
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c 
CPP_update_combinations.cpp -o CPP_update_combinations.o
| g++ -I/usr/share/R/include -DNDEBUG   
-I"/usr/lib/R/site-library/Rcpp/include"   -fpic  -g -O2 
-fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. -fstack-protector-strong 
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c 
RcppExports.cpp -o RcppExports.o
| g++ -shared -L/usr/lib/R/lib -Wl,-z,relro -o treescape.so 
CPP_update_combinations.o RcppExports.o -L/usr/lib/R/lib -lR
| installing to 
/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape/libs
| ** R
| ** data
| *** moving datasets to lazyload DB
| ** inst
| ** preparing package for lazy loading
| Creating a generic function for 'toJSON' from package 'jsonlite' in package 
'googleVis'
| Warning in rgl.init(initValue, onlyNULL) :
|   RGL: unable to open X11 display
| Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
| Error: segfault from C stack overflow
| * removing 
'/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape'
| 
| (Ignore the X warnings, they are irrelevant here, I'm too lazy to run
| it in xvfb and it's in a VM without X.)

xvfb does not given you OpenGL. This wants X11 _and OpenGL_.

Because of that, I think it would be worthwhile to check if it were to build
without the need for rgl.

Else the '--no-test-load' added to the 'R CMD INSTALL' call should skip all
that.  We would need a local copy of r-cran.mk, or just copy it and adjust.

Maybe Andreas can give you a hand?

| When it succeeds (195.3 MiB stack limit):
| 
| * installing *source* package 'treescape' ...
| ** package 'treescape' successfully unpacked and MD5 sums checked
| ** libs
| g++ -I/usr/share/R/include -DNDEBUG   
-I"/usr/lib/R/site-library/Rcpp/include"   -fpic  -g -O2 
-fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. -fstack-protector-strong 
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c 
CPP_update_combinations.cpp -o CPP_update_combinations.o
| g++ -I/usr/share/R/include -DNDEBUG   
-I"/usr/lib/R/site-library/Rcpp/include"   -fpic  -g -O2 
-fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. -fstack-protector-strong 
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c 
RcppExports.cpp -o RcppExports.o
| g++ -shared -L/usr/lib/R/lib -Wl,-z,relro -o treescape.so 
CPP_update_combinations.o RcppExports.o -L/usr/lib/R/lib -lR
| installing to 
/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape/libs
| ** R
| ** data
| *** moving datasets to lazyload DB
| ** inst
| ** preparing package for lazy loading
| Creating a generic function for 'toJSON' from package 'jsonlite' in package 
'googleVis'
| Warning in rgl.init(initValue, onlyNULL) :
|   RGL: unable to open X11 display
| Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
| ** help
| *** installing help indices
| ** building package indices
| ** installing vignettes
| ** testing if installed package can be loaded
| Creating a generic function for 'toJSON' from package 'jsonlite' in package 
'googleVis'
| Warning in rgl.init(initValue, onlyNULL) :
|   RGL: unable to open X11 display
| Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
| * DONE (treescape)
| 
| So the problem occurs at the following step:
| 
|   ** preparing package for lazy loading
| 
| And, to recap the specific circumstances where this problem appears:
| 
|  - 32bit
|  - Little Endian architecture
|  - Linux 3.16
|  - Standard stack size limit (8 MiB)
|  - treescape module version >= 1.10.17
| 
| Change only one of these things and it will work:
| 
|  - 64bit Little Endian Linux 3.16, standard stack limit: works
|   e.g. amd64, arm64
|  - 32bit Big Endian Linux 3.16, standard stack limit: works
| 

Re: Possible workaround

2016-12-14 Thread Christian Seiler
Hi,

On 12/14/2016 04:16 PM, Dirk Eddelbuettel wrote:
> One quick thought: does it die in _compilation_ which we have seen with other
> (C++-heavy) packages?

No, g++ works fine here. (The C++ file itself is trivial if you
look at it.)

Current package in Debian:
http://sources.debian.net/src/r-cran-treescape/1.10.18-2/

> Otherwise if it fails _after_ compilation we may be able to get by turning
> some default aspects of R CMD INSTALL off:
> 
>   --no-byte-compile do not byte-compile R code

That doesn't help, still fails. :-(

>   --no-test-loadskip test of loading installed package

That doesn't help either. :-(

>From the build log when it fails (8 MiB stack limit):

* installing *source* package 'treescape' ...
** package 'treescape' successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG   -I"/usr/lib/R/site-library/Rcpp/include"  
 -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -g  -c CPP_update_combinations.cpp -o 
CPP_update_combinations.o
g++ -I/usr/share/R/include -DNDEBUG   -I"/usr/lib/R/site-library/Rcpp/include"  
 -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -g  -c RcppExports.cpp -o RcppExports.o
g++ -shared -L/usr/lib/R/lib -Wl,-z,relro -o treescape.so 
CPP_update_combinations.o RcppExports.o -L/usr/lib/R/lib -lR
installing to 
/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape/libs
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
Creating a generic function for 'toJSON' from package 'jsonlite' in package 
'googleVis'
Warning in rgl.init(initValue, onlyNULL) :
  RGL: unable to open X11 display
Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
Error: segfault from C stack overflow
* removing 
'/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape'

(Ignore the X warnings, they are irrelevant here, I'm too lazy to run
it in xvfb and it's in a VM without X.)

When it succeeds (195.3 MiB stack limit):

* installing *source* package 'treescape' ...
** package 'treescape' successfully unpacked and MD5 sums checked
** libs
g++ -I/usr/share/R/include -DNDEBUG   -I"/usr/lib/R/site-library/Rcpp/include"  
 -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -g  -c CPP_update_combinations.cpp -o 
CPP_update_combinations.o
g++ -I/usr/share/R/include -DNDEBUG   -I"/usr/lib/R/site-library/Rcpp/include"  
 -fpic  -g -O2 -fdebug-prefix-map=/build/r-base-PAdLwq/r-base-3.3.2=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -g  -c RcppExports.cpp -o RcppExports.o
g++ -shared -L/usr/lib/R/lib -Wl,-z,relro -o treescape.so 
CPP_update_combinations.o RcppExports.o -L/usr/lib/R/lib -lR
installing to 
/home/christian/r-cran-treescape-1.10.18/debian/r-cran-treescape/usr/lib/R/site-library/treescape/libs
** R
** data
*** moving datasets to lazyload DB
** inst
** preparing package for lazy loading
Creating a generic function for 'toJSON' from package 'jsonlite' in package 
'googleVis'
Warning in rgl.init(initValue, onlyNULL) :
  RGL: unable to open X11 display
Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
Creating a generic function for 'toJSON' from package 'jsonlite' in package 
'googleVis'
Warning in rgl.init(initValue, onlyNULL) :
  RGL: unable to open X11 display
Warning: 'rgl_init' failed, running with rgl.useNULL = TRUE
* DONE (treescape)

So the problem occurs at the following step:

  ** preparing package for lazy loading

And, to recap the specific circumstances where this problem appears:

 - 32bit
 - Little Endian architecture
 - Linux 3.16
 - Standard stack size limit (8 MiB)
 - treescape module version >= 1.10.17

Change only one of these things and it will work:

 - 64bit Little Endian Linux 3.16, standard stack limit: works
  e.g. amd64, arm64
 - 32bit Big Endian Linux 3.16, standard stack limit: works
  e.g. powerpc
 - 32bit Little Endian Linux 4.7.x or higher, standard stack limit: works
  e.g. i386 on my own system with newer kernel, or the mipsel
  build server of Debian with a backported kernel
 - 32bit Little Endian Linux 3.16, huge stack limit: works
 - older version 1.9.18: works

Note that different kernel versions really mean just the kernel,
the libraries and tools are 100% identical. (I mean libc, R, gcc,
and so on.)

I'm at a complete loss why the kernel version is even relevant here
btw., since the program uses a huge stack, but there is no system
call related stuf

Re: Possible workaround (was: Re: Help: r-cran-treescape does not build on i386, armel and armhf any more)

2016-12-14 Thread Dirk Eddelbuettel

On 14 December 2016 at 15:59, Andreas Tille wrote:
| Hi Christian,
| 
| thanks a lot for your extensive analysis about of the stack problem.  I
| admit I have no idea why this large stack is needed on those
| architectures with stable kernel.  I also have no idea why everything
| went fine with treescape version 1.10.17.  Since I personally fell
| totally clueless I'm forwarding this upstream and also CC Dirk
| Eddelbuettel who is known for his insight and good contact to the R
| community.  May be somebody has a better clue rather than drastically
| increasing the stack size on those failed architectures.

One quick thought: does it die in _compilation_ which we have seen with other
(C++-heavy) packages?  I know that this helps at times:

  export CXX=g++ --param ggc-min-expand=20 -g0

Otherwise if it fails _after_ compilation we may be able to get by turning
some default aspects of R CMD INSTALL off:

  --no-byte-compile do not byte-compile R code
  --no-test-loadskip test of loading installed package

But that is just guessing on my part and we'd have to test the package.

Dirk (at work)


| Thanks again
| 
|  Andreas.
| 
| On Wed, Dec 14, 2016 at 03:37:27PM +0100, Christian Seiler wrote:
| > Hi again,
| > 
| > On 12/14/2016 03:00 PM, Christian Seiler wrote:
| > > If I had to guess what was going on in the backtrace, I'd suspect
| > > an infinite recursion in R code, which translates to infinite
| > > recursion of the underlying C code. But I'm really not sure here.
| > 
| > Interestingly enough, my initial guess was wrong.
| > 
| > It's not an infinite recursion, it's just a very, very deep
| > recursion, using a LOT of stack. If I increase the stack size
| > limit by to 200 MB, then the package builds successfully,
| > I tried that in a loop 25 times.
| > 
| > However, with an earlier attempt at 160 MB stack size limit,
| > it worked most of the time, but not always, I did get the
| > same error once, so the amount of stack space required does
| > not appear to be the same when calling the program multiple
| > times. (With 160 MB I tried around 15 times, and once the
| > 160 MB limit was insufficient.)
| > 
| > It might even be in rare cases that the 200 MB limit is not
| > enough and a build could fail spuriously even with that.
| > 
| > > Why that only appears to occur on 32bit LE architectures with
| > > stable kernels (and works fine with unstable kernels on the same
| > > architecture, and even with the stable kernel on 64bit both LE
| > > and BE, as well as on 32bit BE) I also have no clue.
| > 
| > And this is still beyond me, because the default stack size
| > limit of 8 MB is more than sufficient on e.g. amd64, where
| > pointers are twice as large, so the amount of stack frames
| > that fit in that limit there is actually smaller.
| > 
| > So it appears you can work around this bug by manually
| > setting an artificially high stack size limit during the
| > build, but there is still an underlying problem there that
| > causes the stack usage to be drastically higher on
| > 32bit LE platforms with kernel 3.16, that doesn't appear
| > on the same platforms with a newer kernel.
| > 
| > Anyway, to work around this for now, you can replace your
| > dh_auto_install line (that is passed to the xvfb call)
| > with the following command:
| > 
| >   /bin/sh -c "ulimit -S -s 20 ; exec dh_auto_install"
| > 
| > Just tried it, sbuild built the package successfully on
| > i386. I haven't tried armhf, but I suspect the result will
| > be the same.
| > 
| > But the underlying problem should also be fixed: a stack
| > size that is 25 times higher than usual is worrisome,
| > especially with the standard limit being plenty sufficient
| > on platforms with larger pointer sizes. You might have to
| > ask upstream and/or the R community for advice though. (Maybe
| > see what R function specifically does this deep recursion,
| > and fix that function to be a lot shallower. I don't know
| > how to get that information from a gdb backtrace though, as
| > I don't know the internals of R.)
| > 
| > Hope that helps.
| > 
| > Regards,
| > Christian
| > 
| 
| -- 
| http://fam-tille.de

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org



Re: Possible workaround

2016-12-14 Thread Christian Seiler
Hi Andreas,

On 12/14/2016 03:59 PM, Andreas Tille wrote:
> thanks a lot for your extensive analysis about of the stack problem.  I
> admit I have no idea why this large stack is needed on those
> architectures with stable kernel.  I also have no idea why everything
> went fine with treescape version 1.10.17.

For the record: 1.10.17 also failed its build on i386:

https://buildd.debian.org/status/fetch.php?pkg=r-cran-treescape&arch=i386&ver=1.10.17-1&stamp=1480164348

The last (and only) successful build was 1.9.17:

https://buildd.debian.org/status/fetch.php?pkg=r-cran-treescape&arch=i386&ver=1.9.17-1&stamp=1468346976

I just tried to rebuild 1.9.17 on i386 and that still works (in the
sense that it builds, don't know if the package actually works) - so
the problem appeared somewhere between 1.9.17 and 1.10.17.

Regards,
Christian



Re: Possible workaround (was: Re: Help: r-cran-treescape does not build on i386, armel and armhf any more)

2016-12-14 Thread Andreas Tille
Hi Christian,

thanks a lot for your extensive analysis about of the stack problem.  I
admit I have no idea why this large stack is needed on those
architectures with stable kernel.  I also have no idea why everything
went fine with treescape version 1.10.17.  Since I personally fell
totally clueless I'm forwarding this upstream and also CC Dirk
Eddelbuettel who is known for his insight and good contact to the R
community.  May be somebody has a better clue rather than drastically
increasing the stack size on those failed architectures.

Thanks again

 Andreas.

On Wed, Dec 14, 2016 at 03:37:27PM +0100, Christian Seiler wrote:
> Hi again,
> 
> On 12/14/2016 03:00 PM, Christian Seiler wrote:
> > If I had to guess what was going on in the backtrace, I'd suspect
> > an infinite recursion in R code, which translates to infinite
> > recursion of the underlying C code. But I'm really not sure here.
> 
> Interestingly enough, my initial guess was wrong.
> 
> It's not an infinite recursion, it's just a very, very deep
> recursion, using a LOT of stack. If I increase the stack size
> limit by to 200 MB, then the package builds successfully,
> I tried that in a loop 25 times.
> 
> However, with an earlier attempt at 160 MB stack size limit,
> it worked most of the time, but not always, I did get the
> same error once, so the amount of stack space required does
> not appear to be the same when calling the program multiple
> times. (With 160 MB I tried around 15 times, and once the
> 160 MB limit was insufficient.)
> 
> It might even be in rare cases that the 200 MB limit is not
> enough and a build could fail spuriously even with that.
> 
> > Why that only appears to occur on 32bit LE architectures with
> > stable kernels (and works fine with unstable kernels on the same
> > architecture, and even with the stable kernel on 64bit both LE
> > and BE, as well as on 32bit BE) I also have no clue.
> 
> And this is still beyond me, because the default stack size
> limit of 8 MB is more than sufficient on e.g. amd64, where
> pointers are twice as large, so the amount of stack frames
> that fit in that limit there is actually smaller.
> 
> So it appears you can work around this bug by manually
> setting an artificially high stack size limit during the
> build, but there is still an underlying problem there that
> causes the stack usage to be drastically higher on
> 32bit LE platforms with kernel 3.16, that doesn't appear
> on the same platforms with a newer kernel.
> 
> Anyway, to work around this for now, you can replace your
> dh_auto_install line (that is passed to the xvfb call)
> with the following command:
> 
>   /bin/sh -c "ulimit -S -s 20 ; exec dh_auto_install"
> 
> Just tried it, sbuild built the package successfully on
> i386. I haven't tried armhf, but I suspect the result will
> be the same.
> 
> But the underlying problem should also be fixed: a stack
> size that is 25 times higher than usual is worrisome,
> especially with the standard limit being plenty sufficient
> on platforms with larger pointer sizes. You might have to
> ask upstream and/or the R community for advice though. (Maybe
> see what R function specifically does this deep recursion,
> and fix that function to be a lot shallower. I don't know
> how to get that information from a gdb backtrace though, as
> I don't know the internals of R.)
> 
> Hope that helps.
> 
> Regards,
> Christian
> 

-- 
http://fam-tille.de



Possible workaround (was: Re: Help: r-cran-treescape does not build on i386, armel and armhf any more)

2016-12-14 Thread Christian Seiler
Hi again,

On 12/14/2016 03:00 PM, Christian Seiler wrote:
> If I had to guess what was going on in the backtrace, I'd suspect
> an infinite recursion in R code, which translates to infinite
> recursion of the underlying C code. But I'm really not sure here.

Interestingly enough, my initial guess was wrong.

It's not an infinite recursion, it's just a very, very deep
recursion, using a LOT of stack. If I increase the stack size
limit by to 200 MB, then the package builds successfully,
I tried that in a loop 25 times.

However, with an earlier attempt at 160 MB stack size limit,
it worked most of the time, but not always, I did get the
same error once, so the amount of stack space required does
not appear to be the same when calling the program multiple
times. (With 160 MB I tried around 15 times, and once the
160 MB limit was insufficient.)

It might even be in rare cases that the 200 MB limit is not
enough and a build could fail spuriously even with that.

> Why that only appears to occur on 32bit LE architectures with
> stable kernels (and works fine with unstable kernels on the same
> architecture, and even with the stable kernel on 64bit both LE
> and BE, as well as on 32bit BE) I also have no clue.

And this is still beyond me, because the default stack size
limit of 8 MB is more than sufficient on e.g. amd64, where
pointers are twice as large, so the amount of stack frames
that fit in that limit there is actually smaller.

So it appears you can work around this bug by manually
setting an artificially high stack size limit during the
build, but there is still an underlying problem there that
causes the stack usage to be drastically higher on
32bit LE platforms with kernel 3.16, that doesn't appear
on the same platforms with a newer kernel.

Anyway, to work around this for now, you can replace your
dh_auto_install line (that is passed to the xvfb call)
with the following command:

  /bin/sh -c "ulimit -S -s 20 ; exec dh_auto_install"

Just tried it, sbuild built the package successfully on
i386. I haven't tried armhf, but I suspect the result will
be the same.

But the underlying problem should also be fixed: a stack
size that is 25 times higher than usual is worrisome,
especially with the standard limit being plenty sufficient
on platforms with larger pointer sizes. You might have to
ask upstream and/or the R community for advice though. (Maybe
see what R function specifically does this deep recursion,
and fix that function to be a lot shallower. I don't know
how to get that information from a gdb backtrace though, as
I don't know the internals of R.)

Hope that helps.

Regards,
Christian