Re: IMPALA-5702 - disable shared linking on jenkins?
I meant the from-scratch on Ubuntu 14.04 job. I've started an ASAN build: https://jenkins.impala.io/view/Utility/job/ubuntu-14.04-from-scratch/1764/ On Mon, Jul 24, 2017 at 5:52 PM, Henry Robinsonwrote: > Could you point me to the failing job? I couldn't see it obviously on > https://jenkins.impala.io/. > > On 24 July 2017 at 17:42, Jim Apple wrote: > > > Yes, ASAN in the current 1404 job fails with something about linking. I > > haven't got around to investigating in detail. > > > > On Mon, Jul 24, 2017 at 1:39 PM, Todd Lipcon wrote: > > > > > Is it possible that the issue here is due to a "one definition rule" > > > violation? eg something like > > > https://github.com/google/sanitizers/wiki/ > AddressSanitizerOneDefinitionR > > > uleViolation > > > Another similar thing is described here: > > > https://github.com/google/sanitizers/wiki/ > AddressSanitizerInitialization > > > OrderFiasco > > > > > > ASAN with the appropriate flags might help expose if one of the above > is > > > related. > > > > > > I wonder whether it is a kind of coincidence that it is fine in a > static > > > build but causes problems in dynamic, and at some point the static link > > > order may slightly shift, causing another new subtle bug. > > > > > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson > > wrote: > > > > > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs, > > > where > > > > a custom cluster test fails by throwing an exception during locale > > > > handling. > > > > > > > > I've been able to reproduce this locally, but only with shared > linking > > > > enabled (which makes sense since the issue is symptomatic of a global > > > c'tor > > > > not getting called the right number of times). > > > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > > > forced a > > > > more correct linking strategy for thirdparty libraries when dynamic > > > linking > > > > was enabled), but it looks to me at first glance like there were > latent > > > > dynamic linking bugs that we weren't getting hit by. Fixing > IMPALA-5702 > > > > will probably take a while, and I don't think we should hold up GVOs > or > > > put > > > > them at risk. > > > > > > > > So there are two options: > > > > > > > > 1. Revert IMPALA-5659 > > > > > > > > 2. Switch GVO to static linking > > > > > > > > IMPALA-5659 is important to commit the kudu util library, which is > > needed > > > > for the KRPC work. Without it, shared linking doesn't work *at all* > > when > > > > the kudu util library is committed. > > > > > > > > Static linking doesn't take much longer in my unscientific > > measurements, > > > > and is closer to how Impala is actually used. In the interest of > > forward > > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > > > static > > > > linking while I work on IMPALA-5702. > > > > > > > > What does everyone else think? > > > > > > > > Henry > > > > > > > > > > > > > > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > >
Re: IMPALA-5702 - disable shared linking on jenkins?
Got it - thanks for the clarification! Also, I think I was unclear in my stated concern for new contributors. It seems to me that new contributors could choose to use the -so flag, even if the official pre-merge jobs doesn't, but that there is a cost to diverging from the pre-merge job in that it is hard to know what is to blame if your pre-merge job fails. On Mon, Jul 24, 2017 at 5:46 PM, Henry Robinsonwrote: > On 24 July 2017 at 17:43, Jim Apple wrote: > > > On Mon, Jul 24, 2017 at 5:08 PM, Henry Robinson > wrote: > > > > > On 24 July 2017 at 17:04, Jim Apple wrote: > > > > > > > I had anticipated that shared linking would save time and disk space, > > but > > > > it sounds like, from your testing, it doesn't save much time. Does it > > > save > > > > disk space? > > > > > > > > > > I haven't measured but I would expect not. Do we need to be very > careful > > > about disk space in the current configuration? > > > > > > > I don't think so, but since we are trying to entice new community members > > to commit patches, I am concerned about the cost on developer machines. > > > > > > > > > > > > > > > > > > Does static linking save time when compiling incremental changes? > > > > > > > > > > Again, I haven't measured. > > > > > > > > > I'm confused. You said, "Static linking doesn't take much longer in my > > unscientific measurements". > > > > I am also confused. I spoke about end-to-end builds on > ubuntu-14.04-from-scratch. I haven't measured incremental changes, unless > they're covered by that build. >
Re: IMPALA-5702 - disable shared linking on jenkins?
Could you point me to the failing job? I couldn't see it obviously on https://jenkins.impala.io/. On 24 July 2017 at 17:42, Jim Applewrote: > Yes, ASAN in the current 1404 job fails with something about linking. I > haven't got around to investigating in detail. > > On Mon, Jul 24, 2017 at 1:39 PM, Todd Lipcon wrote: > > > Is it possible that the issue here is due to a "one definition rule" > > violation? eg something like > > https://github.com/google/sanitizers/wiki/AddressSanitizerOneDefinitionR > > uleViolation > > Another similar thing is described here: > > https://github.com/google/sanitizers/wiki/AddressSanitizerInitialization > > OrderFiasco > > > > ASAN with the appropriate flags might help expose if one of the above is > > related. > > > > I wonder whether it is a kind of coincidence that it is fine in a static > > build but causes problems in dynamic, and at some point the static link > > order may slightly shift, causing another new subtle bug. > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson > wrote: > > > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs, > > where > > > a custom cluster test fails by throwing an exception during locale > > > handling. > > > > > > I've been able to reproduce this locally, but only with shared linking > > > enabled (which makes sense since the issue is symptomatic of a global > > c'tor > > > not getting called the right number of times). > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > > forced a > > > more correct linking strategy for thirdparty libraries when dynamic > > linking > > > was enabled), but it looks to me at first glance like there were latent > > > dynamic linking bugs that we weren't getting hit by. Fixing IMPALA-5702 > > > will probably take a while, and I don't think we should hold up GVOs or > > put > > > them at risk. > > > > > > So there are two options: > > > > > > 1. Revert IMPALA-5659 > > > > > > 2. Switch GVO to static linking > > > > > > IMPALA-5659 is important to commit the kudu util library, which is > needed > > > for the KRPC work. Without it, shared linking doesn't work *at all* > when > > > the kudu util library is committed. > > > > > > Static linking doesn't take much longer in my unscientific > measurements, > > > and is closer to how Impala is actually used. In the interest of > forward > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > > static > > > linking while I work on IMPALA-5702. > > > > > > What does everyone else think? > > > > > > Henry > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > >
Re: IMPALA-5702 - disable shared linking on jenkins?
On 24 July 2017 at 17:43, Jim Applewrote: > On Mon, Jul 24, 2017 at 5:08 PM, Henry Robinson wrote: > > > On 24 July 2017 at 17:04, Jim Apple wrote: > > > > > I had anticipated that shared linking would save time and disk space, > but > > > it sounds like, from your testing, it doesn't save much time. Does it > > save > > > disk space? > > > > > > > I haven't measured but I would expect not. Do we need to be very careful > > about disk space in the current configuration? > > > > I don't think so, but since we are trying to entice new community members > to commit patches, I am concerned about the cost on developer machines. > > > > > > > > > > > > Does static linking save time when compiling incremental changes? > > > > > > > Again, I haven't measured. > > > > > I'm confused. You said, "Static linking doesn't take much longer in my > unscientific measurements". > I am also confused. I spoke about end-to-end builds on ubuntu-14.04-from-scratch. I haven't measured incremental changes, unless they're covered by that build.
Re: IMPALA-5702 - disable shared linking on jenkins?
Yes, ASAN in the current 1404 job fails with something about linking. I haven't got around to investigating in detail. On Mon, Jul 24, 2017 at 1:39 PM, Todd Lipconwrote: > Is it possible that the issue here is due to a "one definition rule" > violation? eg something like > https://github.com/google/sanitizers/wiki/AddressSanitizerOneDefinitionR > uleViolation > Another similar thing is described here: > https://github.com/google/sanitizers/wiki/AddressSanitizerInitialization > OrderFiasco > > ASAN with the appropriate flags might help expose if one of the above is > related. > > I wonder whether it is a kind of coincidence that it is fine in a static > build but causes problems in dynamic, and at some point the static link > order may slightly shift, causing another new subtle bug. > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson wrote: > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs, > where > > a custom cluster test fails by throwing an exception during locale > > handling. > > > > I've been able to reproduce this locally, but only with shared linking > > enabled (which makes sense since the issue is symptomatic of a global > c'tor > > not getting called the right number of times). > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > forced a > > more correct linking strategy for thirdparty libraries when dynamic > linking > > was enabled), but it looks to me at first glance like there were latent > > dynamic linking bugs that we weren't getting hit by. Fixing IMPALA-5702 > > will probably take a while, and I don't think we should hold up GVOs or > put > > them at risk. > > > > So there are two options: > > > > 1. Revert IMPALA-5659 > > > > 2. Switch GVO to static linking > > > > IMPALA-5659 is important to commit the kudu util library, which is needed > > for the KRPC work. Without it, shared linking doesn't work *at all* when > > the kudu util library is committed. > > > > Static linking doesn't take much longer in my unscientific measurements, > > and is closer to how Impala is actually used. In the interest of forward > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > static > > linking while I work on IMPALA-5702. > > > > What does everyone else think? > > > > Henry > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
Re: IMPALA-5702 - disable shared linking on jenkins?
On Mon, Jul 24, 2017 at 5:08 PM, Henry Robinsonwrote: > On 24 July 2017 at 17:04, Jim Apple wrote: > > > I had anticipated that shared linking would save time and disk space, but > > it sounds like, from your testing, it doesn't save much time. Does it > save > > disk space? > > > > I haven't measured but I would expect not. Do we need to be very careful > about disk space in the current configuration? > I don't think so, but since we are trying to entice new community members to commit patches, I am concerned about the cost on developer machines. > > > > > > Does static linking save time when compiling incremental changes? > > > > Again, I haven't measured. > I'm confused. You said, "Static linking doesn't take much longer in my unscientific measurements".
Re: IMPALA-5702 - disable shared linking on jenkins?
On 24 July 2017 at 17:08, Henry Robinsonwrote: > > > On 24 July 2017 at 17:04, Jim Apple wrote: > >> I had anticipated that shared linking would save time and disk space, but >> it sounds like, from your testing, it doesn't save much time. Does it save >> disk space? >> > > I haven't measured but I would expect not. Do we need to be very careful > about disk space in the current configuration? > I just saw the disk space report at the end of ubuntu-14.04-from-scratch. It costs about 50% more disk space (about 30Gb) which is a large amount, but the executors have plenty of room left. dynamic: *00:08:20* /dev/xvda1161129 61335 93144 40% / static: *09:20:31* /dev/xvda1161129 92138 62341 60% / > > >> >> Does static linking save time when compiling incremental changes? >> > > Again, I haven't measured. > > >> >> On Mon, Jul 24, 2017 at 4:51 PM, Henry Robinson wrote: >> >> > :) I agree - we should also track any known breaks to shared linking in >> a >> > best effort fashion because it's so useful to some dev workflows. >> > >> > On 24 July 2017 at 16:49, Tim Armstrong >> wrote: >> > >> > > I vote for changing Jenkins' linking strategy now and not changing it >> > back >> > > :). Static linking is the blessed configuration so I think we should >> be >> > > running tests with that primarily. >> > > >> > > On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinson >> > wrote: >> > > >> > > > On 24 July 2017 at 13:58, Todd Lipcon wrote: >> > > > >> > > > > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson > > >> > > > wrote: >> > > > > >> > > > > > Thanks for the asan pointer - I'll give it a go. >> > > > > > >> > > > > > My understanding of linking isn't deep, but my working theory >> has >> > > been >> > > > > that >> > > > > > the complications have been caused by glog getting linked twice >> - >> > > once >> > > > > > statically (possibly into libkudu.so), and once dynamically (via >> > > > everyone >> > > > > > else). >> > > > > > >> > > > > >> > > > > In libkudu_client.so, we use a linker script to ensure that we >> don't >> > > leak >> > > > > glog/gflags/etc symbols. Those are all listed as 'local' in >> > > > > src/kudu/client/symbols.map. We also have a unit test >> > > > > 'client_symbol-test.sh' which uses nm to dump the list of symbols >> and >> > > > make >> > > > > sure that they all non-local non-weak symbols are under the >> 'kudu::' >> > > > > namespace. >> > > > > >> > > > > So it's possible that something's getting linked twice but I'd be >> > > > somewhat >> > > > > surprised if it's from the Kudu client. >> > > > > >> > > > > >> > > > Good to know, thanks. >> > > > >> > > > ASAN hasn't turned up anything yet - so does anyone have an opinion >> > about >> > > > changing Jenkins' linking strategy for now? >> > > > >> > > > >> > > > > -Todd >> > > > > >> > > > > >> > > > > > >> > > > > > I would think that could lead to one or both of the issues you >> > linked >> > > > to. >> > > > > > >> > > > > > >> > > > > > On 24 July 2017 at 13:39, Todd Lipcon >> wrote: >> > > > > > >> > > > > > > Is it possible that the issue here is due to a "one definition >> > > rule" >> > > > > > > violation? eg something like >> > > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn >> > > > > > > eDefinitionRuleViolation >> > > > > > > Another similar thing is described here: >> > > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn >> > > > > > > itializationOrderFiasco >> > > > > > > >> > > > > > > ASAN with the appropriate flags might help expose if one of >> the >> > > above >> > > > > is >> > > > > > > related. >> > > > > > > >> > > > > > > I wonder whether it is a kind of coincidence that it is fine >> in a >> > > > > static >> > > > > > > build but causes problems in dynamic, and at some point the >> > static >> > > > link >> > > > > > > order may slightly shift, causing another new subtle bug. >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson < >> > he...@apache.org> >> > > > > > wrote: >> > > > > > > >> > > > > > > > We've started seeing isolated incidences of IMPALA-5702 >> during >> > > > GVOs, >> > > > > > > where >> > > > > > > > a custom cluster test fails by throwing an exception during >> > > locale >> > > > > > > > handling. >> > > > > > > > >> > > > > > > > I've been able to reproduce this locally, but only with >> shared >> > > > > linking >> > > > > > > > enabled (which makes sense since the issue is symptomatic >> of a >> > > > global >> > > > > > > c'tor >> > > > > > > > not getting called the right number of times). >> > > > > > > > >> > > > > > > > It's probable that my patch for IMPALA-5659 exposed this >> (since >> > > it >> > > > > > > forced a >> > > > > > > > more correct linking strategy for thirdparty
Re: IMPALA-5702 - disable shared linking on jenkins?
On 24 July 2017 at 17:04, Jim Applewrote: > I had anticipated that shared linking would save time and disk space, but > it sounds like, from your testing, it doesn't save much time. Does it save > disk space? > I haven't measured but I would expect not. Do we need to be very careful about disk space in the current configuration? > > Does static linking save time when compiling incremental changes? > Again, I haven't measured. > > On Mon, Jul 24, 2017 at 4:51 PM, Henry Robinson wrote: > > > :) I agree - we should also track any known breaks to shared linking in a > > best effort fashion because it's so useful to some dev workflows. > > > > On 24 July 2017 at 16:49, Tim Armstrong wrote: > > > > > I vote for changing Jenkins' linking strategy now and not changing it > > back > > > :). Static linking is the blessed configuration so I think we should be > > > running tests with that primarily. > > > > > > On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinson > > wrote: > > > > > > > On 24 July 2017 at 13:58, Todd Lipcon wrote: > > > > > > > > > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson > > > > wrote: > > > > > > > > > > > Thanks for the asan pointer - I'll give it a go. > > > > > > > > > > > > My understanding of linking isn't deep, but my working theory has > > > been > > > > > that > > > > > > the complications have been caused by glog getting linked twice - > > > once > > > > > > statically (possibly into libkudu.so), and once dynamically (via > > > > everyone > > > > > > else). > > > > > > > > > > > > > > > > In libkudu_client.so, we use a linker script to ensure that we > don't > > > leak > > > > > glog/gflags/etc symbols. Those are all listed as 'local' in > > > > > src/kudu/client/symbols.map. We also have a unit test > > > > > 'client_symbol-test.sh' which uses nm to dump the list of symbols > and > > > > make > > > > > sure that they all non-local non-weak symbols are under the > 'kudu::' > > > > > namespace. > > > > > > > > > > So it's possible that something's getting linked twice but I'd be > > > > somewhat > > > > > surprised if it's from the Kudu client. > > > > > > > > > > > > > > Good to know, thanks. > > > > > > > > ASAN hasn't turned up anything yet - so does anyone have an opinion > > about > > > > changing Jenkins' linking strategy for now? > > > > > > > > > > > > > -Todd > > > > > > > > > > > > > > > > > > > > > > I would think that could lead to one or both of the issues you > > linked > > > > to. > > > > > > > > > > > > > > > > > > On 24 July 2017 at 13:39, Todd Lipcon wrote: > > > > > > > > > > > > > Is it possible that the issue here is due to a "one definition > > > rule" > > > > > > > violation? eg something like > > > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > > > > > > > eDefinitionRuleViolation > > > > > > > Another similar thing is described here: > > > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > > > > > > > itializationOrderFiasco > > > > > > > > > > > > > > ASAN with the appropriate flags might help expose if one of the > > > above > > > > > is > > > > > > > related. > > > > > > > > > > > > > > I wonder whether it is a kind of coincidence that it is fine > in a > > > > > static > > > > > > > build but causes problems in dynamic, and at some point the > > static > > > > link > > > > > > > order may slightly shift, causing another new subtle bug. > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson < > > he...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > > > We've started seeing isolated incidences of IMPALA-5702 > during > > > > GVOs, > > > > > > > where > > > > > > > > a custom cluster test fails by throwing an exception during > > > locale > > > > > > > > handling. > > > > > > > > > > > > > > > > I've been able to reproduce this locally, but only with > shared > > > > > linking > > > > > > > > enabled (which makes sense since the issue is symptomatic of > a > > > > global > > > > > > > c'tor > > > > > > > > not getting called the right number of times). > > > > > > > > > > > > > > > > It's probable that my patch for IMPALA-5659 exposed this > (since > > > it > > > > > > > forced a > > > > > > > > more correct linking strategy for thirdparty libraries when > > > dynamic > > > > > > > linking > > > > > > > > was enabled), but it looks to me at first glance like there > > were > > > > > latent > > > > > > > > dynamic linking bugs that we weren't getting hit by. Fixing > > > > > IMPALA-5702 > > > > > > > > will probably take a while, and I don't think we should hold > up > > > > GVOs > > > > > or > > > > > > > put > > > > > > > > them at risk. > > > > > > > > > > > > > > > > So there are two options: > > > > > > > > > > > > > > > > 1. Revert IMPALA-5659 > > > > > > > > > > > > > > > > 2. Switch GVO to static
Re: IMPALA-5702 - disable shared linking on jenkins?
I had anticipated that shared linking would save time and disk space, but it sounds like, from your testing, it doesn't save much time. Does it save disk space? Does static linking save time when compiling incremental changes? On Mon, Jul 24, 2017 at 4:51 PM, Henry Robinsonwrote: > :) I agree - we should also track any known breaks to shared linking in a > best effort fashion because it's so useful to some dev workflows. > > On 24 July 2017 at 16:49, Tim Armstrong wrote: > > > I vote for changing Jenkins' linking strategy now and not changing it > back > > :). Static linking is the blessed configuration so I think we should be > > running tests with that primarily. > > > > On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinson > wrote: > > > > > On 24 July 2017 at 13:58, Todd Lipcon wrote: > > > > > > > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson > > > wrote: > > > > > > > > > Thanks for the asan pointer - I'll give it a go. > > > > > > > > > > My understanding of linking isn't deep, but my working theory has > > been > > > > that > > > > > the complications have been caused by glog getting linked twice - > > once > > > > > statically (possibly into libkudu.so), and once dynamically (via > > > everyone > > > > > else). > > > > > > > > > > > > > In libkudu_client.so, we use a linker script to ensure that we don't > > leak > > > > glog/gflags/etc symbols. Those are all listed as 'local' in > > > > src/kudu/client/symbols.map. We also have a unit test > > > > 'client_symbol-test.sh' which uses nm to dump the list of symbols and > > > make > > > > sure that they all non-local non-weak symbols are under the 'kudu::' > > > > namespace. > > > > > > > > So it's possible that something's getting linked twice but I'd be > > > somewhat > > > > surprised if it's from the Kudu client. > > > > > > > > > > > Good to know, thanks. > > > > > > ASAN hasn't turned up anything yet - so does anyone have an opinion > about > > > changing Jenkins' linking strategy for now? > > > > > > > > > > -Todd > > > > > > > > > > > > > > > > > > I would think that could lead to one or both of the issues you > linked > > > to. > > > > > > > > > > > > > > > On 24 July 2017 at 13:39, Todd Lipcon wrote: > > > > > > > > > > > Is it possible that the issue here is due to a "one definition > > rule" > > > > > > violation? eg something like > > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > > > > > > eDefinitionRuleViolation > > > > > > Another similar thing is described here: > > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > > > > > > itializationOrderFiasco > > > > > > > > > > > > ASAN with the appropriate flags might help expose if one of the > > above > > > > is > > > > > > related. > > > > > > > > > > > > I wonder whether it is a kind of coincidence that it is fine in a > > > > static > > > > > > build but causes problems in dynamic, and at some point the > static > > > link > > > > > > order may slightly shift, causing another new subtle bug. > > > > > > > > > > > > > > > > > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson < > he...@apache.org> > > > > > wrote: > > > > > > > > > > > > > We've started seeing isolated incidences of IMPALA-5702 during > > > GVOs, > > > > > > where > > > > > > > a custom cluster test fails by throwing an exception during > > locale > > > > > > > handling. > > > > > > > > > > > > > > I've been able to reproduce this locally, but only with shared > > > > linking > > > > > > > enabled (which makes sense since the issue is symptomatic of a > > > global > > > > > > c'tor > > > > > > > not getting called the right number of times). > > > > > > > > > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since > > it > > > > > > forced a > > > > > > > more correct linking strategy for thirdparty libraries when > > dynamic > > > > > > linking > > > > > > > was enabled), but it looks to me at first glance like there > were > > > > latent > > > > > > > dynamic linking bugs that we weren't getting hit by. Fixing > > > > IMPALA-5702 > > > > > > > will probably take a while, and I don't think we should hold up > > > GVOs > > > > or > > > > > > put > > > > > > > them at risk. > > > > > > > > > > > > > > So there are two options: > > > > > > > > > > > > > > 1. Revert IMPALA-5659 > > > > > > > > > > > > > > 2. Switch GVO to static linking > > > > > > > > > > > > > > IMPALA-5659 is important to commit the kudu util library, which > > is > > > > > needed > > > > > > > for the KRPC work. Without it, shared linking doesn't work *at > > all* > > > > > when > > > > > > > the kudu util library is committed. > > > > > > > > > > > > > > Static linking doesn't take much longer in my unscientific > > > > > measurements, > > > > > > > and is closer to how Impala is actually used. In the interest > of > > > > > forward > > > > > > > progress
Re: Thrift version used by Impala
Just to follow up on this - I spent some time looking at what would be required to a Thrift 0.9.3 upgrade (since some relevant changes have affected this since the last time I looked). The short answer is that it's not a small change. See https://issues.apache.org/jira/browse/IMPALA-5690 for my notes so far. Henry On 20 June 2017 at 15:10, Henry Robinsonwrote: > The main reason I haven't done it yet is because Thrift 0.9.3 introduces a > Bison dependency for compilation, and I hadn't got round to getting that > working on all the platforms I care about. No particular technical reason. > > On 20 June 2017 at 15:06, Alexander Kolbasov wrote: > >> As part of our investigation of Impala/Sentry integration issues it turned >> out that Impala uses a version of Thrift that is older then 0.9.3 that's >> used by Sentry (and many other components). Is there a fundamental reason >> Impala can't move to Thrift 0.9.3? There were some security >> vulnerabilities >> in earlier versions. >> >> - Alex >> > >
Re: IMPALA-5702 - disable shared linking on jenkins?
:) I agree - we should also track any known breaks to shared linking in a best effort fashion because it's so useful to some dev workflows. On 24 July 2017 at 16:49, Tim Armstrongwrote: > I vote for changing Jenkins' linking strategy now and not changing it back > :). Static linking is the blessed configuration so I think we should be > running tests with that primarily. > > On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinson wrote: > > > On 24 July 2017 at 13:58, Todd Lipcon wrote: > > > > > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson > > wrote: > > > > > > > Thanks for the asan pointer - I'll give it a go. > > > > > > > > My understanding of linking isn't deep, but my working theory has > been > > > that > > > > the complications have been caused by glog getting linked twice - > once > > > > statically (possibly into libkudu.so), and once dynamically (via > > everyone > > > > else). > > > > > > > > > > In libkudu_client.so, we use a linker script to ensure that we don't > leak > > > glog/gflags/etc symbols. Those are all listed as 'local' in > > > src/kudu/client/symbols.map. We also have a unit test > > > 'client_symbol-test.sh' which uses nm to dump the list of symbols and > > make > > > sure that they all non-local non-weak symbols are under the 'kudu::' > > > namespace. > > > > > > So it's possible that something's getting linked twice but I'd be > > somewhat > > > surprised if it's from the Kudu client. > > > > > > > > Good to know, thanks. > > > > ASAN hasn't turned up anything yet - so does anyone have an opinion about > > changing Jenkins' linking strategy for now? > > > > > > > -Todd > > > > > > > > > > > > > > I would think that could lead to one or both of the issues you linked > > to. > > > > > > > > > > > > On 24 July 2017 at 13:39, Todd Lipcon wrote: > > > > > > > > > Is it possible that the issue here is due to a "one definition > rule" > > > > > violation? eg something like > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > > > > > eDefinitionRuleViolation > > > > > Another similar thing is described here: > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > > > > > itializationOrderFiasco > > > > > > > > > > ASAN with the appropriate flags might help expose if one of the > above > > > is > > > > > related. > > > > > > > > > > I wonder whether it is a kind of coincidence that it is fine in a > > > static > > > > > build but causes problems in dynamic, and at some point the static > > link > > > > > order may slightly shift, causing another new subtle bug. > > > > > > > > > > > > > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson > > > > wrote: > > > > > > > > > > > We've started seeing isolated incidences of IMPALA-5702 during > > GVOs, > > > > > where > > > > > > a custom cluster test fails by throwing an exception during > locale > > > > > > handling. > > > > > > > > > > > > I've been able to reproduce this locally, but only with shared > > > linking > > > > > > enabled (which makes sense since the issue is symptomatic of a > > global > > > > > c'tor > > > > > > not getting called the right number of times). > > > > > > > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since > it > > > > > forced a > > > > > > more correct linking strategy for thirdparty libraries when > dynamic > > > > > linking > > > > > > was enabled), but it looks to me at first glance like there were > > > latent > > > > > > dynamic linking bugs that we weren't getting hit by. Fixing > > > IMPALA-5702 > > > > > > will probably take a while, and I don't think we should hold up > > GVOs > > > or > > > > > put > > > > > > them at risk. > > > > > > > > > > > > So there are two options: > > > > > > > > > > > > 1. Revert IMPALA-5659 > > > > > > > > > > > > 2. Switch GVO to static linking > > > > > > > > > > > > IMPALA-5659 is important to commit the kudu util library, which > is > > > > needed > > > > > > for the KRPC work. Without it, shared linking doesn't work *at > all* > > > > when > > > > > > the kudu util library is committed. > > > > > > > > > > > > Static linking doesn't take much longer in my unscientific > > > > measurements, > > > > > > and is closer to how Impala is actually used. In the interest of > > > > forward > > > > > > progress I'd like to try switching ubuntu-14.04-from-scratch to > use > > > > > static > > > > > > linking while I work on IMPALA-5702. > > > > > > > > > > > > What does everyone else think? > > > > > > > > > > > > Henry > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Todd Lipcon > > > > > Software Engineer, Cloudera > > > > > > > > > > > > > > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > >
Re: IMPALA-5702 - disable shared linking on jenkins?
I vote for changing Jenkins' linking strategy now and not changing it back :). Static linking is the blessed configuration so I think we should be running tests with that primarily. On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinsonwrote: > On 24 July 2017 at 13:58, Todd Lipcon wrote: > > > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson > wrote: > > > > > Thanks for the asan pointer - I'll give it a go. > > > > > > My understanding of linking isn't deep, but my working theory has been > > that > > > the complications have been caused by glog getting linked twice - once > > > statically (possibly into libkudu.so), and once dynamically (via > everyone > > > else). > > > > > > > In libkudu_client.so, we use a linker script to ensure that we don't leak > > glog/gflags/etc symbols. Those are all listed as 'local' in > > src/kudu/client/symbols.map. We also have a unit test > > 'client_symbol-test.sh' which uses nm to dump the list of symbols and > make > > sure that they all non-local non-weak symbols are under the 'kudu::' > > namespace. > > > > So it's possible that something's getting linked twice but I'd be > somewhat > > surprised if it's from the Kudu client. > > > > > Good to know, thanks. > > ASAN hasn't turned up anything yet - so does anyone have an opinion about > changing Jenkins' linking strategy for now? > > > > -Todd > > > > > > > > > > I would think that could lead to one or both of the issues you linked > to. > > > > > > > > > On 24 July 2017 at 13:39, Todd Lipcon wrote: > > > > > > > Is it possible that the issue here is due to a "one definition rule" > > > > violation? eg something like > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > > > > eDefinitionRuleViolation > > > > Another similar thing is described here: > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > > > > itializationOrderFiasco > > > > > > > > ASAN with the appropriate flags might help expose if one of the above > > is > > > > related. > > > > > > > > I wonder whether it is a kind of coincidence that it is fine in a > > static > > > > build but causes problems in dynamic, and at some point the static > link > > > > order may slightly shift, causing another new subtle bug. > > > > > > > > > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson > > > wrote: > > > > > > > > > We've started seeing isolated incidences of IMPALA-5702 during > GVOs, > > > > where > > > > > a custom cluster test fails by throwing an exception during locale > > > > > handling. > > > > > > > > > > I've been able to reproduce this locally, but only with shared > > linking > > > > > enabled (which makes sense since the issue is symptomatic of a > global > > > > c'tor > > > > > not getting called the right number of times). > > > > > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > > > > forced a > > > > > more correct linking strategy for thirdparty libraries when dynamic > > > > linking > > > > > was enabled), but it looks to me at first glance like there were > > latent > > > > > dynamic linking bugs that we weren't getting hit by. Fixing > > IMPALA-5702 > > > > > will probably take a while, and I don't think we should hold up > GVOs > > or > > > > put > > > > > them at risk. > > > > > > > > > > So there are two options: > > > > > > > > > > 1. Revert IMPALA-5659 > > > > > > > > > > 2. Switch GVO to static linking > > > > > > > > > > IMPALA-5659 is important to commit the kudu util library, which is > > > needed > > > > > for the KRPC work. Without it, shared linking doesn't work *at all* > > > when > > > > > the kudu util library is committed. > > > > > > > > > > Static linking doesn't take much longer in my unscientific > > > measurements, > > > > > and is closer to how Impala is actually used. In the interest of > > > forward > > > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > > > > static > > > > > linking while I work on IMPALA-5702. > > > > > > > > > > What does everyone else think? > > > > > > > > > > Henry > > > > > > > > > > > > > > > > > > > > > -- > > > > Todd Lipcon > > > > Software Engineer, Cloudera > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > > >
Re: IMPALA-5702 - disable shared linking on jenkins?
On 24 July 2017 at 13:58, Todd Lipconwrote: > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson wrote: > > > Thanks for the asan pointer - I'll give it a go. > > > > My understanding of linking isn't deep, but my working theory has been > that > > the complications have been caused by glog getting linked twice - once > > statically (possibly into libkudu.so), and once dynamically (via everyone > > else). > > > > In libkudu_client.so, we use a linker script to ensure that we don't leak > glog/gflags/etc symbols. Those are all listed as 'local' in > src/kudu/client/symbols.map. We also have a unit test > 'client_symbol-test.sh' which uses nm to dump the list of symbols and make > sure that they all non-local non-weak symbols are under the 'kudu::' > namespace. > > So it's possible that something's getting linked twice but I'd be somewhat > surprised if it's from the Kudu client. > > Good to know, thanks. ASAN hasn't turned up anything yet - so does anyone have an opinion about changing Jenkins' linking strategy for now? > -Todd > > > > > > I would think that could lead to one or both of the issues you linked to. > > > > > > On 24 July 2017 at 13:39, Todd Lipcon wrote: > > > > > Is it possible that the issue here is due to a "one definition rule" > > > violation? eg something like > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > > > eDefinitionRuleViolation > > > Another similar thing is described here: > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > > > itializationOrderFiasco > > > > > > ASAN with the appropriate flags might help expose if one of the above > is > > > related. > > > > > > I wonder whether it is a kind of coincidence that it is fine in a > static > > > build but causes problems in dynamic, and at some point the static link > > > order may slightly shift, causing another new subtle bug. > > > > > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson > > wrote: > > > > > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs, > > > where > > > > a custom cluster test fails by throwing an exception during locale > > > > handling. > > > > > > > > I've been able to reproduce this locally, but only with shared > linking > > > > enabled (which makes sense since the issue is symptomatic of a global > > > c'tor > > > > not getting called the right number of times). > > > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > > > forced a > > > > more correct linking strategy for thirdparty libraries when dynamic > > > linking > > > > was enabled), but it looks to me at first glance like there were > latent > > > > dynamic linking bugs that we weren't getting hit by. Fixing > IMPALA-5702 > > > > will probably take a while, and I don't think we should hold up GVOs > or > > > put > > > > them at risk. > > > > > > > > So there are two options: > > > > > > > > 1. Revert IMPALA-5659 > > > > > > > > 2. Switch GVO to static linking > > > > > > > > IMPALA-5659 is important to commit the kudu util library, which is > > needed > > > > for the KRPC work. Without it, shared linking doesn't work *at all* > > when > > > > the kudu util library is committed. > > > > > > > > Static linking doesn't take much longer in my unscientific > > measurements, > > > > and is closer to how Impala is actually used. In the interest of > > forward > > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > > > static > > > > linking while I work on IMPALA-5702. > > > > > > > > What does everyone else think? > > > > > > > > Henry > > > > > > > > > > > > > > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >
Re: IMPALA-5702 - disable shared linking on jenkins?
On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinsonwrote: > Thanks for the asan pointer - I'll give it a go. > > My understanding of linking isn't deep, but my working theory has been that > the complications have been caused by glog getting linked twice - once > statically (possibly into libkudu.so), and once dynamically (via everyone > else). > In libkudu_client.so, we use a linker script to ensure that we don't leak glog/gflags/etc symbols. Those are all listed as 'local' in src/kudu/client/symbols.map. We also have a unit test 'client_symbol-test.sh' which uses nm to dump the list of symbols and make sure that they all non-local non-weak symbols are under the 'kudu::' namespace. So it's possible that something's getting linked twice but I'd be somewhat surprised if it's from the Kudu client. -Todd > > I would think that could lead to one or both of the issues you linked to. > > > On 24 July 2017 at 13:39, Todd Lipcon wrote: > > > Is it possible that the issue here is due to a "one definition rule" > > violation? eg something like > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > > eDefinitionRuleViolation > > Another similar thing is described here: > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > > itializationOrderFiasco > > > > ASAN with the appropriate flags might help expose if one of the above is > > related. > > > > I wonder whether it is a kind of coincidence that it is fine in a static > > build but causes problems in dynamic, and at some point the static link > > order may slightly shift, causing another new subtle bug. > > > > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson > wrote: > > > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs, > > where > > > a custom cluster test fails by throwing an exception during locale > > > handling. > > > > > > I've been able to reproduce this locally, but only with shared linking > > > enabled (which makes sense since the issue is symptomatic of a global > > c'tor > > > not getting called the right number of times). > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > > forced a > > > more correct linking strategy for thirdparty libraries when dynamic > > linking > > > was enabled), but it looks to me at first glance like there were latent > > > dynamic linking bugs that we weren't getting hit by. Fixing IMPALA-5702 > > > will probably take a while, and I don't think we should hold up GVOs or > > put > > > them at risk. > > > > > > So there are two options: > > > > > > 1. Revert IMPALA-5659 > > > > > > 2. Switch GVO to static linking > > > > > > IMPALA-5659 is important to commit the kudu util library, which is > needed > > > for the KRPC work. Without it, shared linking doesn't work *at all* > when > > > the kudu util library is committed. > > > > > > Static linking doesn't take much longer in my unscientific > measurements, > > > and is closer to how Impala is actually used. In the interest of > forward > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > > static > > > linking while I work on IMPALA-5702. > > > > > > What does everyone else think? > > > > > > Henry > > > > > > > > > > > -- > > Todd Lipcon > > Software Engineer, Cloudera > -- Todd Lipcon Software Engineer, Cloudera
Re: IMPALA-5702 - disable shared linking on jenkins?
Thanks for the asan pointer - I'll give it a go. My understanding of linking isn't deep, but my working theory has been that the complications have been caused by glog getting linked twice - once statically (possibly into libkudu.so), and once dynamically (via everyone else). I would think that could lead to one or both of the issues you linked to. On 24 July 2017 at 13:39, Todd Lipconwrote: > Is it possible that the issue here is due to a "one definition rule" > violation? eg something like > https://github.com/google/sanitizers/wiki/AddressSanitizerOn > eDefinitionRuleViolation > Another similar thing is described here: > https://github.com/google/sanitizers/wiki/AddressSanitizerIn > itializationOrderFiasco > > ASAN with the appropriate flags might help expose if one of the above is > related. > > I wonder whether it is a kind of coincidence that it is fine in a static > build but causes problems in dynamic, and at some point the static link > order may slightly shift, causing another new subtle bug. > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson wrote: > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs, > where > > a custom cluster test fails by throwing an exception during locale > > handling. > > > > I've been able to reproduce this locally, but only with shared linking > > enabled (which makes sense since the issue is symptomatic of a global > c'tor > > not getting called the right number of times). > > > > It's probable that my patch for IMPALA-5659 exposed this (since it > forced a > > more correct linking strategy for thirdparty libraries when dynamic > linking > > was enabled), but it looks to me at first glance like there were latent > > dynamic linking bugs that we weren't getting hit by. Fixing IMPALA-5702 > > will probably take a while, and I don't think we should hold up GVOs or > put > > them at risk. > > > > So there are two options: > > > > 1. Revert IMPALA-5659 > > > > 2. Switch GVO to static linking > > > > IMPALA-5659 is important to commit the kudu util library, which is needed > > for the KRPC work. Without it, shared linking doesn't work *at all* when > > the kudu util library is committed. > > > > Static linking doesn't take much longer in my unscientific measurements, > > and is closer to how Impala is actually used. In the interest of forward > > progress I'd like to try switching ubuntu-14.04-from-scratch to use > static > > linking while I work on IMPALA-5702. > > > > What does everyone else think? > > > > Henry > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera
IMPALA-5702 - disable shared linking on jenkins?
We've started seeing isolated incidences of IMPALA-5702 during GVOs, where a custom cluster test fails by throwing an exception during locale handling. I've been able to reproduce this locally, but only with shared linking enabled (which makes sense since the issue is symptomatic of a global c'tor not getting called the right number of times). It's probable that my patch for IMPALA-5659 exposed this (since it forced a more correct linking strategy for thirdparty libraries when dynamic linking was enabled), but it looks to me at first glance like there were latent dynamic linking bugs that we weren't getting hit by. Fixing IMPALA-5702 will probably take a while, and I don't think we should hold up GVOs or put them at risk. So there are two options: 1. Revert IMPALA-5659 2. Switch GVO to static linking IMPALA-5659 is important to commit the kudu util library, which is needed for the KRPC work. Without it, shared linking doesn't work *at all* when the kudu util library is committed. Static linking doesn't take much longer in my unscientific measurements, and is closer to how Impala is actually used. In the interest of forward progress I'd like to try switching ubuntu-14.04-from-scratch to use static linking while I work on IMPALA-5702. What does everyone else think? Henry
Re: Slow/unusable apache JIRA?
Great, thanks On Mon, Jul 24, 2017 at 9:25 AM, Henry Robinsonwrote: > There's a link here: https://www.apache.org/dev/infra-contact > > On 24 July 2017 at 09:24, Matthew Jacobs wrote: > > > Thanks. > > @Henry, where is the infra hipchat channel? > > > > On Mon, Jul 24, 2017 at 9:22 AM, Jeszy wrote: > > > > > Hey, > > > > > > No problems usually, but now It's down for me as well. According to > > > status.apache.org, the service seems to be struggling. > > > > > > On 24 July 2017 at 18:15, Matthew Jacobs wrote: > > > > Hey, > > > > > > > > I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has > > > > anyone else noticed this? Sometimes it's just annoying, but today > I've > > > > found a lot of pages are just timing out. > > > > > > > > Just got this error when attempting to load > > > > https://issues.apache.org/jira/browse/IMPALA-5275 > > > > > > > > > > > > Communications Breakdown > > > > > > > > The call to the JIRA server did not complete within the timeout > period. > > > We > > > > are unsure of the result of this operation. > > > > > > > > Close this dialog and press refresh in your browser > > > > > >
Re: Slow/unusable apache JIRA?
There's a link here: https://www.apache.org/dev/infra-contact On 24 July 2017 at 09:24, Matthew Jacobswrote: > Thanks. > @Henry, where is the infra hipchat channel? > > On Mon, Jul 24, 2017 at 9:22 AM, Jeszy wrote: > > > Hey, > > > > No problems usually, but now It's down for me as well. According to > > status.apache.org, the service seems to be struggling. > > > > On 24 July 2017 at 18:15, Matthew Jacobs wrote: > > > Hey, > > > > > > I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has > > > anyone else noticed this? Sometimes it's just annoying, but today I've > > > found a lot of pages are just timing out. > > > > > > Just got this error when attempting to load > > > https://issues.apache.org/jira/browse/IMPALA-5275 > > > > > > > > > Communications Breakdown > > > > > > The call to the JIRA server did not complete within the timeout period. > > We > > > are unsure of the result of this operation. > > > > > > Close this dialog and press refresh in your browser > > >
Re: Slow/unusable apache JIRA?
Thanks. @Henry, where is the infra hipchat channel? On Mon, Jul 24, 2017 at 9:22 AM, Jeszywrote: > Hey, > > No problems usually, but now It's down for me as well. According to > status.apache.org, the service seems to be struggling. > > On 24 July 2017 at 18:15, Matthew Jacobs wrote: > > Hey, > > > > I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has > > anyone else noticed this? Sometimes it's just annoying, but today I've > > found a lot of pages are just timing out. > > > > Just got this error when attempting to load > > https://issues.apache.org/jira/browse/IMPALA-5275 > > > > > > Communications Breakdown > > > > The call to the JIRA server did not complete within the timeout period. > We > > are unsure of the result of this operation. > > > > Close this dialog and press refresh in your browser >
Re: Slow/unusable apache JIRA?
Hey, No problems usually, but now It's down for me as well. According to status.apache.org, the service seems to be struggling. On 24 July 2017 at 18:15, Matthew Jacobswrote: > Hey, > > I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has > anyone else noticed this? Sometimes it's just annoying, but today I've > found a lot of pages are just timing out. > > Just got this error when attempting to load > https://issues.apache.org/jira/browse/IMPALA-5275 > > > Communications Breakdown > > The call to the JIRA server did not complete within the timeout period. We > are unsure of the result of this operation. > > Close this dialog and press refresh in your browser
Re: Slow/unusable apache JIRA?
Yep, it's super slow for me as well. Was just in the infra hipchat channel, and they are now aware of the issue. On 24 July 2017 at 09:15, Matthew Jacobswrote: > Hey, > > I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has > anyone else noticed this? Sometimes it's just annoying, but today I've > found a lot of pages are just timing out. > > Just got this error when attempting to load > https://issues.apache.org/jira/browse/IMPALA-5275 > > > Communications Breakdown > > The call to the JIRA server did not complete within the timeout period. We > are unsure of the result of this operation. > > Close this dialog and press refresh in your browser >
Slow/unusable apache JIRA?
Hey, I've been noticing a lot of slowness/timeouts on the Apache JIRA. Has anyone else noticed this? Sometimes it's just annoying, but today I've found a lot of pages are just timing out. Just got this error when attempting to load https://issues.apache.org/jira/browse/IMPALA-5275 Communications Breakdown The call to the JIRA server did not complete within the timeout period. We are unsure of the result of this operation. Close this dialog and press refresh in your browser