Re: New Committer/PMC Member: Ethan Li

2018-04-16 Thread Erik Weathers
congrats Ethan!

On Mon, Apr 16, 2018 at 10:51 AM, P. Taylor Goetz  wrote:

> Congratulations! Welcome Ethan!
>
> -Taylor
>
> > On Apr 16, 2018, at 12:23 PM, Bobby Evans  wrote:
> >
> > Please Join with me in welcoming Ethan Li as the newest Apache Storm
> > committer and PMC member.
> >
> > Great work!
>
>


Re: New Committer/PMC Member: Roshan Naik

2018-04-05 Thread Erik Weathers
Congrats Roshan!!

On Thu, Apr 5, 2018 at 8:26 PM Ethan Li  wrote:

> Congratulations! Roshan
>
> Ethan Li
>
> > On Apr 5, 2018, at 21:40, Jungtaek Lim  wrote:
> >
> > Congrats Roshan!
> >
> > 2018년 4월 6일 (금) 오전 11:39, P. Taylor Goetz 님이 작성:
> >
> >> Please join me in congratulating and welcoming Roshan Naik as the latest
> >> Apache Storm committer and PMC member.
> >>
> >> Welcome Roshan!
> >>
> >> -Taylor
> >>
> >>
>


Re: New Committer/PMC Member: Erik Weathers

2018-02-23 Thread Erik Weathers
Thanks everyone!  And yes Alexandre, the relation of my surname to the
project name was not lost on me. ;-)(Also, my Grandpa went by "Stormy"
too!)

Also, I must thank my work teammates Srishty Agrawal and Jessica Hartog who
have contributed greatly to any of the work that I've pushed.

- Erik

On Fri, Feb 23, 2018 at 8:28 AM, Ethan Li <ethanopensou...@gmail.com> wrote:

> Congratulations! Erik
>
> - Ethan
>
> > On Feb 22, 2018, at 7:42 PM, Xin Wang <data.xinw...@gmail.com> wrote:
> >
> > Congrats!
> >
> > 2018-02-23 9:41 GMT+08:00 Hugo Da Cruz Louro <hlo...@hortonworks.com>:
> >
> >> Congrats & Welcome!
> >>
> >>> On Feb 22, 2018, at 2:45 PM, Jungtaek Lim <kabh...@gmail.com> wrote:
> >>>
> >>> Welcome Erik! Congrats!
> >>>
> >>> -Jungtaek Lim (HeartSaVioR)
> >>>
> >>> 2018년 2월 23일 (금) 오전 7:05, Stig Rohde Døssing <stigdoess...@gmail.com
> >님이
> >> 작성:
> >>>
> >>>> Congratulations Erik. Happy to see you join.
> >>>>
> >>>> 2018-02-22 20:53 GMT+01:00 Alexandre Vermeerbergen <
> >>>> avermeerber...@gmail.com
> >>>>> :
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> Welcome to Erik...
> >>>>> ... a Stormy Weather(s) sounds like a fantastic match indeed!
> >>>>>
> >>>>> Alexandre Vermeerbergen (Storm addict)
> >>>>>
> >>>>> 2018-02-22 20:49 GMT+01:00 P. Taylor Goetz <ptgo...@apache.org>:
> >>>>>> The Apache Storm PMC has voted to add Erik Weathers as a Committer
> and
> >>>>> PMC Member.
> >>>>>>
> >>>>>> Please join me in congratulating Erik on his new role!
> >>>>>>
> >>>>>> -Taylor
> >>>>>
> >>>>
> >>
> >>
> >
> >
> > --
> > Thanks,
> > Xin
>
>


Re: [DISCUSS] consider EOL for version lines

2018-02-13 Thread Erik Weathers
Thanks for keeping storm-mesos in mind Stig. :)  I'd be most worried about
any issues we might see with the backported storm-kafka-client and how we
*might* need to fix bugs in 1.0.x.  At least it should be easy to
cherry-pick fixes back into 1.0.x after the backport-stomping of STORM-2937.

Look forward to working with Bobby to get a long term plan for storm to run
on mesos in 2.x+.

- Erik

On Tue, Feb 13, 2018 at 11:26 AM, Stig Rohde Døssing  wrote:

> +1 to maintain 3 version lines, though we may want to look at what we can
> do for storm mesos, which I think it currently stuck on 1.0.x.
>
> 2018-02-13 20:17 GMT+01:00 Hugo Da Cruz Louro :
>
> > +1 to maintain 3 version lines. Let’s properly announce that in our
> portal
> > and users list such that users know what’s coming.
> >
> > Agree with focusing on 2.0 which has a lot of improvements, rather than
> > 1.x, x >= 3.
> >
> > > On Feb 13, 2018, at 10:43 AM, Alexandre Vermeerbergen <
> > avermeerber...@gmail.com> wrote:
> > >
> > > +1 (non binding) to maintaining less version lines, provided that
> > > 1.2.x branch is maintained long enough to allow progressive adoption
> > > of 2.x
> > >
> > > Alexandre Vermeerbergen
> > >
> > > 2018-02-13 19:38 GMT+01:00 Priyank Shah :
> > >> +1 to maintaining 3 version lines as suggested by Jungtaek.
> > >>
> > >> On 2/13/18, 9:51 AM, "Arun Iyer on behalf of Arun Mahadevan" <
> > ai...@hortonworks.com on behalf of ar...@apache.org> wrote:
> > >>
> > >>+1 to maintain 3 version lines.
> > >>
> > >>I think the next focus should be 2.0.0 than 1.3.0.
> > >>
> > >>
> > >>
> > >>
> > >>On 2/12/18, 11:40 PM, "Jungtaek Lim"  wrote:
> > >>
> > >>> Hi devs,
> > >>>
> > >>> I've noticed that we are providing 4 different version lines (1.1.x,
> > 1.0.x,
> > >>> 0.10.x, 0.9.x) in download page, and I expect we will add one more
> for
> > >>> 1.2.0. Moreover, we have one more develop version line (2.0.0 -
> master)
> > >>> which most of development happens there.
> > >>>
> > >>> Recently we're releasing 3 version lines (1.0.6 / 1.1.2 / 1.2.0)
> > >>> simultaneously and it took heavy effort to track all the RCs and
> > verify all
> > >>> of them. I guess release manager would take more overhead of
> > releasing, and
> > >>> it doesn't make sense for me if we continue maintaining all of them.
> > >>>
> > >>> Ideally I'd like to propose maintaining three version lines: 2.0.0
> > (next
> > >>> major) / 1.3.0 (next minor - may not happen) / 1.2.1 (next bugfix)
> and
> > >>> making others EOL (that respects semantic versioning and even other
> > >>> projects tend to maintain only two version lines), but if someone
> > feels too
> > >>> aggressive, I propose at least we explicitly announce EOL to 0.x
> > version
> > >>> lines and get rid of any supports (downloads) for them.
> > >>>
> > >>> Would like to hear your opinion.
> > >>>
> > >>> Thanks,
> > >>> Jungtaek Lim (HeartSaVioR)
> > >>
> > >>
> > >>
> > >
> >
> >
>


Re: [VOTE] Release Apache Storm 1.0.6 (rc3)

2018-02-09 Thread Erik Weathers
I'm fine submitting a PR to back that line out (or any of you committer
folks could just rip it out).

But I'd like to understand Storm a bit better as part of making this
decision. :-)  Am I correct in assuming it would only be a problem if the
serialized Fields were stored somewhere (e.g., ZooKeeper, local filesystem)
and then read back in after the Nimbus/Workers are brought back up after
the upgrade?  Seems Fields is used in a *lot* of places, and I don't know
precisely what is serialized for reused upon Storm Nimbus/Worker daemon
restarts.  I believe there are examples of Fields being used to create
Spout or Bolt objects that are used to create the StormTopology object,
which I believe is serialized into ZooKeeper.  But I'm not clear if it's
directly the Fields object itself or some kind of translation from that
into the thrift objects that make up StormTopology.

I also don't know exactly when kryo is applicable in Storm.  I've never
done anything with kryo directly.

- Erik

On Thu, Feb 8, 2018 at 10:00 PM, P. Taylor Goetz <ptgo...@gmail.com> wrote:

> *serialized* ;)
>
> > On Feb 9, 2018, at 12:48 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:
> >
> > I’d have to check (can’t right now), but I think that class gets
> sterilized via kryo. If that’s not the case, yes, it could cause problems.
> >
> > I think the safest option would be to remove the serialversionuid.
> >
> > -Taylor
> >
> >> On Feb 8, 2018, at 5:36 PM, Erik Weathers <eweath...@groupon.com.INVALID>
> wrote:
> >>
> >> Something I just realized -- in the storm-kafka-client stomping into
> >> 1.0.x-branch PR, I backported a change to Fields.java which added a
> >> serialVersionUID.
> >> Could that potentially break topologies when you upgrade storm-core on
> the
> >> servers (nimbus, workers) from 1.0.{1..5} to 1.0.6?   I'm not super
> >> familiar with the serialization that occurs in Storm and whether that
> could
> >> break people.
> >>
> >> https://github.com/apache/storm/pull/2550/files#diff-71a428d
> 508c4f5af0bfe3cc186e8edcf
> >>
> >> - Erik
> >>
> >>> On Thu, Feb 8, 2018 at 1:25 PM, Bobby Evans <ev...@oath.com.invalid>
> wrote:
> >>>
> >>> +1 I built the code from the git tag, ran all the unit tests (which
> passed
> >>> the first time), and ran some tests on a single node cluster.
> >>>
> >>> It all looked good.
> >>>
> >>> - Bobby
> >>>
> >>>> On Thu, Feb 8, 2018 at 1:22 PM P. Taylor Goetz <ptgo...@gmail.com>
> wrote:
> >>>>
> >>>> This is a call to vote on releasing Apache Storm 1.0.6 (rc3)
> >>>>
> >>>> Full list of changes in this release:
> >>>>
> >>>>
> >>>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.
> >>> 0.6-rc3/RELEASE_NOTES.html
> >>>>
> >>>> The tag/commit to be voted upon is v1.0.6:
> >>>>
> >>>>
> >>>> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=tree;h=
> >>> e68365f9f947ddd1794b2edef2149fdfaa1590a2;hb=7993db01580ce62d
> 44866dc00e0a72
> >>> 66984638d0
> >>>>
> >>>> The source archive being voted upon can be found here:
> >>>>
> >>>>
> >>>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.
> >>> 0.6-rc3/apache-storm-1.0.6-src.tar.gz
> >>>>
> >>>> Other release files, signatures and digests can be found here:
> >>>>
> >>>> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.0.6-rc3/
> >>>>
> >>>> The release artifacts are signed with the following key:
> >>>>
> >>>>
> >>>> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_
> >>> plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
> >>>>
> >>>> The Nexus staging repository for this release is:
> >>>>
> >>>> https://repository.apache.org/content/repositories/orgapache
> storm-1060
> >>>>
> >>>> Please vote on releasing this package as Apache Storm 1.0.6.
> >>>>
> >>>> When voting, please list the actions taken to verify the release.
> >>>>
> >>>> This vote will be open for at least 72 hours.
> >>>>
> >>>> [ ] +1 Release this package as Apache Storm 1.0.6
> >>>> [ ]  0 No opinion
> >>>> [ ] -1 Do not release this package because...
> >>>>
> >>>> Thanks to everyone who contributed to this release.
> >>>>
> >>>> -Taylor
> >>>
>


Re: [VOTE] Release Apache Storm 1.0.6 (rc3)

2018-02-08 Thread Erik Weathers
Something I just realized -- in the storm-kafka-client stomping into
1.0.x-branch PR, I backported a change to Fields.java which added a
serialVersionUID.
Could that potentially break topologies when you upgrade storm-core on the
servers (nimbus, workers) from 1.0.{1..5} to 1.0.6?   I'm not super
familiar with the serialization that occurs in Storm and whether that could
break people.

https://github.com/apache/storm/pull/2550/files#diff-71a428d508c4f5af0bfe3cc186e8edcf

- Erik

On Thu, Feb 8, 2018 at 1:25 PM, Bobby Evans  wrote:

> +1 I built the code from the git tag, ran all the unit tests (which passed
> the first time), and ran some tests on a single node cluster.
>
> It all looked good.
>
> - Bobby
>
> On Thu, Feb 8, 2018 at 1:22 PM P. Taylor Goetz  wrote:
>
> > This is a call to vote on releasing Apache Storm 1.0.6 (rc3)
> >
> > Full list of changes in this release:
> >
> >
> > https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.
> 0.6-rc3/RELEASE_NOTES.html
> >
> > The tag/commit to be voted upon is v1.0.6:
> >
> >
> > https://git-wip-us.apache.org/repos/asf?p=storm.git;a=tree;h=
> e68365f9f947ddd1794b2edef2149fdfaa1590a2;hb=7993db01580ce62d44866dc00e0a72
> 66984638d0
> >
> > The source archive being voted upon can be found here:
> >
> >
> > https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.
> 0.6-rc3/apache-storm-1.0.6-src.tar.gz
> >
> > Other release files, signatures and digests can be found here:
> >
> > https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.0.6-rc3/
> >
> > The release artifacts are signed with the following key:
> >
> >
> > https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_
> plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
> >
> > The Nexus staging repository for this release is:
> >
> > https://repository.apache.org/content/repositories/orgapachestorm-1060
> >
> > Please vote on releasing this package as Apache Storm 1.0.6.
> >
> > When voting, please list the actions taken to verify the release.
> >
> > This vote will be open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache Storm 1.0.6
> > [ ]  0 No opinion
> > [ ] -1 Do not release this package because...
> >
> > Thanks to everyone who contributed to this release.
> >
> > -Taylor
> >
>


Re: [DISCUSS] Replace storm-kafka-client on 1.1.x-branch / 1.0.x-branch with 1.x-branch

2018-02-05 Thread Erik Weathers
Thanks for the quick response Jungtaek!

Yes, my teammates and myself would like to help on this.  Is there an
existing JIRA for the work you've been doing on the other branches?

I propose we don't make this block 1.0.6 -- we can just release 1.0.7
quickly when the backport is done, if that is amenable.
That strategy also might be cleaner since it would avoid other changes in
1.0.6 being lumped together with this.

- Erik

On Mon, Feb 5, 2018 at 5:16 PM, Jungtaek Lim <kabh...@gmail.com> wrote:

> UPDATE: I've finished working on overwriting storm-kafka-client 1.x-branch
> to 1.1.x-branch. Not yet pushed to ASF git, but pushed to my fork first to
> trigger Travis CI to see how the build goes well.
>
> https://github.com/HeartSaVioR/storm/commit/76b8a7d3a6f91e66
> 612e87da8589f5723f05218a
> https://travis-ci.org/HeartSaVioR/storm/builds/337819430
>
> Thanks for the input regarding 1.0.x version, Erik. I guess then we have no
> alternative here: someone has to fix storm-kafka-client as well as
> storm-core, since including shaded storm-core doesn't make sense for
> official Storm release.
>
> I guess it doesn't take many hour(s), hence may not worth to sync and talk
> offline. I just wanted to judge whether we are OK to make change of
> storm-core in bugfix version lines, but maybe the judgement itself can be
> possible after finishing the change, so I'll just go ahead making the
> change.
> Since this is blocking release candidate, we should get it ASAP. That's why
> I'm eager to go ahead making the change. If you could spend time now
> helping with making the change ASAP, please leave short notice (maybe with
> JIRA issue?) and go ahead.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2018년 2월 6일 (화) 오전 9:41, Erik Weathers <eweath...@groupon.com.invalid>님이
> 작성:
>
> > hey Jungtaek,
> >
> > Thanks for continuing to pursue this!
> >
> > The issue for Storm not working on Mesos is due to a fundamental change
> to
> > the core scheduling logic in Storm:
> >
> >-
> >
> > https://issues.apache.org/jira/browse/STORM-2126?focusedComm
> entId=16136150=com.atlassian.jira.plugin.system.
> issuetabpanels%3Acomment-tabpanel#comment-16136150
> >
> > The yet-to-be-ironed-out solution that Bobby was brainstorming about
> isn't
> > a short term fix as far as I understand it.  I believe it to be many many
> > months (years?) out for it to actually be workable.  Per my naive
> > understanding of the proposal, we'd probably have to completely rewrite
> the
> > Storm-on-Mesos framework.  So it's probably the right long-term solution,
> > but it isn't anything that should impact this discussion.
> >
> > > The thing is, even users pick storm-kafka-client 1.1.x/1.2.0 and
> include
> > it into their topology jar, it will also not work with Storm 1.0.x. It
> > even can't
> > compile.
> >
> > FWIW, I'm pretty sure that I was able to successfully run
> > storm-kafka-client-1.1.x on a 1.0.5 storm cluster, but only after shading
> > in storm-core-1.1.x to the topology uber jar.   There was *at least* a
> > change to some timer-related class in storm-core in 1.1.x (something
> about
> > milliseconds IIRC -- it's been 1.5 months since I did it, need to revisit
> > the process I followed).
> >
> > I'm happy to help with backporting / stomping storm-kafka-client in
> 1.0.x.
> > Maybe we can talk offline about it?
> >
> > - Erik
> >
> > On Mon, Feb 5, 2018 at 4:20 PM, Jungtaek Lim <kabh...@gmail.com> wrote:
> >
> > > UPDATE: Looks like we changed some parts of storm-core while fixing
> > > storm-kafka-client issues (especially went in 1.1.0), hence overwriting
> > > also incurs changes of storm-core. It doesn't look like a big deal for
> > > 1.1.x-branch, but there looks like needed many changes for
> 1.0.x-branch.
> > >
> > > The thing is, even users pick storm-kafka-client 1.1.x/1.2.0 and
> include
> > it
> > > into their topology jar, it will also not work with Storm 1.0.x. It
> even
> > > can't compile.
> > >
> > > 1.0.x version line was long lived (22 months) even we released Storm
> > 1.1.0
> > > at 11 months ago. Instead of struggling 1.0.x-branch to up to date, I'd
> > > like to suggest that we define 1.0.x-branch as deprecated with guiding
> to
> > > update to latest 1.1.x version or 1.2.0 (after release), and try to
> > resolve
> > > storm-mesos issue with Storm 1.1.0 ASAP to resolve Erik's concern.
> > >
> > > Makes sense? I'll continue working on 1.1.x-branch and update anyway.
> > >
> > >

Re: [DISCUSS] Replace storm-kafka-client on 1.1.x-branch / 1.0.x-branch with 1.x-branch

2018-02-05 Thread Erik Weathers
hey Jungtaek,

Thanks for continuing to pursue this!

The issue for Storm not working on Mesos is due to a fundamental change to
the core scheduling logic in Storm:

   -
   
https://issues.apache.org/jira/browse/STORM-2126?focusedCommentId=16136150=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16136150

The yet-to-be-ironed-out solution that Bobby was brainstorming about isn't
a short term fix as far as I understand it.  I believe it to be many many
months (years?) out for it to actually be workable.  Per my naive
understanding of the proposal, we'd probably have to completely rewrite the
Storm-on-Mesos framework.  So it's probably the right long-term solution,
but it isn't anything that should impact this discussion.

> The thing is, even users pick storm-kafka-client 1.1.x/1.2.0 and include
it into their topology jar, it will also not work with Storm 1.0.x. It
even can't
compile.

FWIW, I'm pretty sure that I was able to successfully run
storm-kafka-client-1.1.x on a 1.0.5 storm cluster, but only after shading
in storm-core-1.1.x to the topology uber jar.   There was *at least* a
change to some timer-related class in storm-core in 1.1.x (something about
milliseconds IIRC -- it's been 1.5 months since I did it, need to revisit
the process I followed).

I'm happy to help with backporting / stomping storm-kafka-client in 1.0.x.
Maybe we can talk offline about it?

- Erik

On Mon, Feb 5, 2018 at 4:20 PM, Jungtaek Lim  wrote:

> UPDATE: Looks like we changed some parts of storm-core while fixing
> storm-kafka-client issues (especially went in 1.1.0), hence overwriting
> also incurs changes of storm-core. It doesn't look like a big deal for
> 1.1.x-branch, but there looks like needed many changes for 1.0.x-branch.
>
> The thing is, even users pick storm-kafka-client 1.1.x/1.2.0 and include it
> into their topology jar, it will also not work with Storm 1.0.x. It even
> can't compile.
>
> 1.0.x version line was long lived (22 months) even we released Storm 1.1.0
> at 11 months ago. Instead of struggling 1.0.x-branch to up to date, I'd
> like to suggest that we define 1.0.x-branch as deprecated with guiding to
> update to latest 1.1.x version or 1.2.0 (after release), and try to resolve
> storm-mesos issue with Storm 1.1.0 ASAP to resolve Erik's concern.
>
> Makes sense? I'll continue working on 1.1.x-branch and update anyway.
>
> -Jungtaek Lim (HeartSaVioR)
>
> 2018년 2월 6일 (화) 오전 7:53, Jungtaek Lim 님이 작성:
>
> > OK. No more opinion/vote in 5 days. I'll treat consensus was made, and go
> > ahead making change: overwrite storm-kafka-client 1.2.0 to two branches
> > 1.1.x/1.0.x.
> >
> > -Jungtaek Lim (HeartSaVioR)
> >
> > 2018년 2월 1일 (목) 오전 10:48, Jungtaek Lim 님이 작성:
> >
> >> This discussion got 4 +1 (binding) and no -1. Moreover two active
> >> maintainers for storm-kafka-client (Hugo and Stig) voted +1.
> >>
> >> Do we want to hold on for hearing more voices, or treating above
> opinions
> >> as consensus and reflect the change?
> >>
> >> Btw, I think we need to sort out the sequences between two topics:
> >> separating storm-kafka-client as independent release cycle, and this. I
> >> guess some of us agreed former topic doesn't related to current RC, but
> I
> >> think this topic can be (should be) reflected to current RC ongoing.
> >>
> >> -Jungtaek Lim (HeartSaVioR)
> >>
> >> 2018년 2월 1일 (목) 오전 4:08, Hugo Da Cruz Louro 님이
> >> 작성:
> >>
> >>> +1 to replace storm-kafka-client in 1.0.x branch.
> >>> Hugo
> >>>
> >>> > On Jan 31, 2018, at 11:03 AM, Stig Rohde Døssing <
> >>> stigdoess...@gmail.com> wrote:
> >>> >
> >>> > +1 to replace storm-kafka-client in 1.0.x branch. Breaking semantic
> >>> > versioning is really nasty, but I think it is the lesser evil in this
> >>> case.
> >>> >
> >>> > 2018-01-31 5:14 GMT+01:00 Harsha :
> >>> >
> >>> >> +1 to replace storm-kafka-client in 1.0.x branch
> >>> >> -Harsha
> >>> >> On Tue, Jan 30, 2018, at 7:04 PM, Jungtaek Lim wrote:
> >>> >>> Bump up this thread so that we could reach consensus earlier. Given
> >>> that
> >>> >> we
> >>> >>> got concern related to this, I think it is ideal to release
> >>> 1.1.x/1.0.x
> >>> >>> with making decision and applying the change if we want.
> >>> >>>
> >>> >>> 2018년 1월 30일 (화) 오전 9:25, Jungtaek Lim 님이 작성:
> >>> >>>
> >>>  Erik's concern brought from 1.0.6 RC1, because they can't use
> Storm
> >>> >> 1.1.0
> >>>  or higher (Storm 1.1.0 broke storm-mesos.). While he could take an
> >>>  workaround to use storm-kafka-client 1.2.0 or 1.1.2 (if we decide
> to
> >>>  replace) with Storm 1.0.6, it would be better if we don't allow
> >>> leaving
> >>>  storm-kafka-client in 1.0.x in inconsistent state.
> >>> 
> >>>  IMHO, breaking backward compatibility is worse, but leaving broken
> >>> >> thing
> >>>  is worst. Hence I'm +1 to replace all, with noticing that it may

Re: [VOTE] Release Apache Storm 1.0.6 (rc1)

2018-01-24 Thread Erik Weathers
Hi Taylor,

I apologize that this objection is a bit long, incomplete, and late for
this release vote.  Also, I by no means intend this as an attack on the
good folks that develop and maintain storm-kafka-client.  That being said,
I'm uncomfortable with the situation of storm-kafka-client in the
1.0.x-branch (and possibly 1.1.x-branch too) and I wonder if everyone is
aware of the situation.  There seem to be a number of important
storm-kafka-client changes that haven't been backported to the
1.0.x-branch.  e.g.,

(1) We discovered in storm-1.0.3 that a spout can get stuck forever if its
stored offsets fall behind the earliest available and
FirstPollOffsetStrategy is set to UNCOMMITTED_EARLIEST or
UNCOMMITTED_LATEST.  We believe this behavior to be fixed in newer
branches, but not in 1.0.x-branch.  The issue is here (note that the
fetchOffset doesn't get updated in the case that it was out of bounds for
the seek):
* https://github.com/apache/storm/blob/v1.0.6/external/
storm-kafka-client/src/main/java/org/apache/storm/kafka/
spout/KafkaSpout.java#L188-L192

(2) storm-1.0.x is using kafka-clients-0.9.0.1, which isn't acceptable when
using Kafka 0.10 due to the performance impact on the Kafka cluster.  I
believe, perhaps naively, that we could use kafka-clients-0.10.x just fine
in storm-kafka-client-1.0.x, even when speaking to a Kafka 0.9 cluster.
(Obviously we'd have to fix any issues with the Kafka API usage from
storm-kafka-clients.)


Notably, this situation is especially problematic for storm-on-mesos, since
we *cannot* run anything newer than storm-1.0.x for the daemons (Nimbus,
Supervisor, Worker), because of a fundamental change that was made to the
storm-core scheduling logic.

I haven't yet spent the time to fully analyze the set of changes made to
storm-kafka-client in the various active branches, so I apologize for a
lack of links to existing PRs and Jira tickets.  I also need to file
tickets for some of this stuff, as I believe issue (1) above was fixed in
newer branches as part of a large patch for another issue.

I also suppose that this isn't super actionable, since it might not be fair
to hold back the entire storm release just for these issues.  However, if
we can get these issues fixed in 1.0.x-branch I hope we can cut the 1.0.7
release soon thereafter.

Thanks!

- Erik

On Wed, Jan 24, 2018 at 10:41 AM, P. Taylor Goetz  wrote:

> This is a call to vote on releasing Apache Storm 1.0.6 (rc1)
>
> Full list of changes in this release:
>
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.
> 0.6-rc1/RELEASE_NOTES.html
>
> The tag/commit to be voted upon is v1.0.6:
>
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=tree;h
> =24a421e34a71353dc6c750b1f026d06df8ead3f2;hb=bce45993f8622e4
> d3e9ccba96cc78e4ef76e48ae
>
> The source archive being voted upon can be found here:
>
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.
> 0.6-rc1/apache-storm-1.0.6-src.tar.gz
>
> Other release files, signatures and digests can be found here:
>
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.0.6-rc1/
>
> The release artifacts are signed with the following key:
>
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_p
> lain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
>
> The Nexus staging repository for this release is:
>
> https://repository.apache.org/content/repositories/orgapachestorm-1054
>
> Please vote on releasing this package as Apache Storm 1.0.6.
>
> When voting, please list the actions taken to verify the release.
>
> This vote will be open for at least 72 hours.
>
> [ ] +1 Release this package as Apache Storm 1.0.6
> [ ]  0 No opinion
> [ ] -1 Do not release this package because...
>
> Thanks to everyone who contributed to this release.
>
> -Taylor
>


Re: How to apply storm-core built by myself quickly

2017-04-05 Thread Erik Weathers
Yep, you should just distribute the new storm-core.jar to all the hosts
where your topology might run.

I see a bunch of discussions on these lists.  There *are* a bunch of random
Qs that come out of "left field" which I'm unsurprised don't get responses.
 "Why was decision X made 5 years ago by Nathan Marz?"  ¯\_(ツ)_/¯

- Erik

On Wed, Apr 5, 2017 at 8:33 PM, Zhechao Ma 
wrote:

> Hi,
>
> I modify some codes of storm-core and build to a jar from souce, for
> debuging some issues of my topology. So how can I apply storm-core built by
> myself to the storm cluster quickly? Do I need to deploy that jar on each
> node of cluster to replace the origin one, or there is  alternative way?
>
> Hopefully to have reply, although I find few people discuss in maillist.
>
>
> --
> Thanks
> Zhechao Ma
>


Re: Beginners Questions

2017-03-07 Thread Erik Weathers
Welcome to the Storm community.  Sorry you've had a tough time getting
going.  Responses inline.

On Tue, Mar 7, 2017 at 9:11 AM, Cameron Cunning 
wrote:

> Hello, I am running into some basic problems getting started with Storm.
> First, to give some background, as part of a class project we would like to
> extend an existing custom scheduler for storm with some new features.
> However, I am having great difficulty trying to get started with storm
> itself.
> First, when I download an existing build and attempt to execute the binary,
> I get a message saying I am attempting to run the client from source code.
>

When I first encountered this problem I was a bit confused.  Googling
helped me.  But I'll spare you that since I already know the answer.

This likely means that you didn't actually download a *release* tarball,
which is what I would term a "build".  Maybe you downloaded an intermediate
/ SNAPSHOT tarball somehow?  (e.g., from GitHub...)   That isn't a real
release I suspect.  Providing info about what you downloaded would help
others to help you.

It's also helpful to include the exact error instead of just describing
it.  That allows for easy googling.  I found the error from the actual
binary I know it comes from (bin/storm or bin/storm.py depending on the
storm version):

The storm client can only be run from within a release. You appear to be
trying to run the client from a checkout of Storm's source code.
You can download a Storm release at http://storm.apache.org/downloads.html

If you *must* run from a source code checkout / download, then the solution
lies here:

   - https://github.com/apache/storm/blob/f2eb6af918d6f4346985644e9744c2
   b1ebce0a51/DEVELOPER.md#building
   


It would be nice if the documentation above included the error that you
received so that the DEVELOPER.md doc could be found from Google easily.
I'm also surprised that I couldn't easily find that doc on the storm docs
website, instead having to go to github.com.

Second, when I try to build Storm myself, (using mvn package), I also get a
> failure on storm-core.
>

Well, you didn't provide *any* info on that failure or even what
version/commit-SHA you tried to build, so it's definitely hard for anyone
to help on this one! ;-)


> The code for the existing scheduler we are attempting to modify (
> https://github.com/flint-stone/storm-elasticity-scheduler) appears to use
> version 0.9.2, so I have been trying to get that version working.
> I did try building 2.0-snapshot, and that was successful,


I'm a bit confused by this statement.  Are you saying that you have been
trying to get storm-0.9.2 working?

FWIW, with Storm you can replace the scheduler without having to rebuild
storm itself.  I believe these steps should suffice:

(1) Download a *release* tarball of storm.  A prebuilt release that doesn't
issue the complaint you encountered.
(2) Build a separate project with your replacement scheduler, outputting a
jar that has your scheduler class in it.
(3) Put that jar on the classpath (e.g., STORM_LIB_DIR) of the Nimbus
process.
(4) Update the storm.yaml's storm.scheduler key to reference your class via
its package path.  See https://storm.apache.org/releases/1.0.0/Storm-
Scheduler.html
(5) Start the Nimbus.

- Erik


> however, I still
> get the message that I am trying to run the client from a source checkout.
>

Already addressed above.

- Erik


>
> I know these must be silly questions, as no one else seems to have them,
> and I apologize for that. Grateful for any feedback, thanks.
>


Re: Are storm metrics reported through JMX too?

2017-02-24 Thread Erik Weathers
Also this proposal would be a real problem for the Storm-on-Mesos project.
The ports would have to be dynamically allocated from Mesos in such a
scenario, and we'd need some way to register and retrieve these ports in
order to drive whatever is then pulling the metrics out from them.  The
same problem exists for the logviewer, as I've documented here:

   - https://issues.apache.org/jira/browse/STORM-1342


On Wed, Feb 8, 2017 at 7:16 AM, Bobby Evans 
wrote:

> For me the big issue is that not every process has a web server on it.
> That may change in the future but for now only the ui and the logviewer
> have a web server up and running.  What is more if we wanted to do this for
> the workers we would need to think about an alternative port that would be
> free for a web server to be on, the resources a web server would use inside
> the worker and how would authentication work to those worker processes.
> SPNEGO would be really difficult to make work.
>
>
> - Bobby
>
> On Tuesday, February 7, 2017, 9:57:27 PM CST, Tech Id <
> tech.login@gmail.com> wrote:Hi Alessandro, Taylor,
>
> Any more updates on this one?
> This seems like a very good feature to have.
>
> Thanks
> TI
>
> On Tue, Dec 6, 2016 at 10:37 AM, Alessandro Bellina <
> abell...@yahoo-inc.com.invalid> wrote:
>
> > Hi S G,
> > Not something I am working on now. What I pushed was just reporter config
> > for Taylor since he needed it. I do think that if default reporting works
> > we could just go to the rocksdb store and ask it for the metrics there?
> > There are other parts of this I haven't published yet but looking to do
> so
> > either this or next week.
> > Thanks,
> > Alessandro
> >
> >
> >
> > On Tuesday, December 6, 2016, 11:10:57 AM CST, S G <
> > sg.online.em...@gmail.com> wrote:Hey Alessandro,
> >
> > Thanks for sharing.
> > Please share if we have plans to use the metrics-servlets from
> dropwizard?
> > (http://metrics.dropwizard.io/3.1.0/manual/servlets/#adminservlet)
> >
> > I think it will be very convenient to have the metrics reported through a
> > REST API from every process (worker, supervisor, nimbus etc.) and in line
> > with most other software like Solr, ES etc. This can be achieved very
> > easily if we use the above metric servlets and they also provide ping,
> > health-check, thread dump etc which are useful too.
> >
> > Please ignore if its already there in the code shared above and I could
> not
> > find it.
> >
> > Thanks,
> > SG
> >
> >
> >
> > On Mon, Dec 5, 2016 at 9:49 PM, Alessandro Bellina <
> abell...@yahoo-inc.com
> > >
> > wrote:
> >
> > > Hi Taylor
> > >
> > > Please see latest commit in: https://github.com/abellina/
> > > storm/tree/reporters
> > >
> > > Specifically inside: storm-core/src/jvm/org/apache/storm/metrics2
> > >
> > > I have a default config that sets up a couple of reporters in
> > > default.yaml. The format is inline with what we discussed, but updated
> > with
> > > what I suggested over the weekend.
> > >
> > > This is definitely a work in progress, but you should be able to
> > > instantiate reporters for your purposes.
> > >
> > > Thanks,
> > >
> > > Alessandro
> > >
> > >
> > >
> > > On Monday, December 5, 2016, 1:26:41 PM CST, Alessandro Bellina <
> > > abell...@yahoo-inc.com.INVALID> wrote:
> > > Yes. Will PR this tonight Taylor. Thanks!
> > > On Monday, December 5, 2016, 12:37:18 PM CST, P. Taylor Goetz <
> > > ptgo...@gmail.com> wrote:Alessandro,
> > >
> > > Are you in a position to open a pull request against the metrics_v2
> > > branch? I’d like to start integrating the work I’ve been doing with the
> > > reporter configuration stuff you have. If what you have is
> > incomplete/WIP,
> > > that’s not a big deal as the metrics_v2 branch is a feature branch and
> > > we’ll have plenty of opportunities to clean things up.
> > >
> > > -Taylor
> > >
> > > > On Nov 29, 2016, at 3:27 PM, Alessandro Bellina <
> > abell...@yahoo-inc.com>
> > > wrote:
> > > >
> > > > Taylor,
> > > >
> > > > Ok maybe there is some effort duplication. For the config, I have the
> > > bare minimum to get the default reporter up. I'll focus on that since
> you
> > > could use it. Will update JIRA with more.
> > > >
> > > > Alessandro
> > > >
> > > >
> > > > - Forwarded Message -
> > > > From: P. Taylor Goetz 
> > > > To: "dev@storm.apache.org" 
> > > > Cc: S G ; na...@narendasan.com <
> > > na...@narendasan.com>; Austin Chung 
> > > > Sent: Tuesday, November 29, 2016, 1:27:58 PM CST
> > > > Subject: Re: Are storm metrics reported through JMX too?
> > > >
> > > > Alessandro,
> > > >
> > > > Where do you stand with the reporter configuration via the storm.yaml
> > > config file?
> > > >
> > > > I have metrics collection for workers and disruptor queues almost
> > ready,
> > > but now I’m looking for flexible configuration (right now I have
> > reporters
> > > hard coded).
> > > >
> > 

Re: ever seen the netty message_queue grow (seemingly) infinitely?

2017-01-05 Thread Erik Weathers
thanks for the response Jungtaek.  There are definitely a ton of executors
in this topology, and its processing a ton of tuples.

Also of note is that the issue I mentioned happens only when custom
app-level metrics are enabled.  When only the storm-level internal metrics
are enabled, the problem goes away.

Unfortunatley, we cannot upgrade to storm-1.x+ for a long while still,
mostly because of the logback to log4j 2 change.  That is going to require
significant effort to get all of our client topologies owned by dozens of
teams to make modifications.  We haven't even yet gotten time to work on
*how* they will have to change (since we have a logback-based library our
clients use to do logging).

This issue is also blocking me from looking into the new metrics stuff and
the exposure of the queue metrics into the UI (which we discussed
separately).

So we'll keep plugging away, I'll update the thread with any concrete
findings.

Thanks!

- Erik

On Wed, Jan 4, 2017 at 8:05 AM, Jungtaek Lim <kabh...@gmail.com> wrote:

> May you already know about this, please also note that count of metrics
> tuples are linear with overall task count. Higher parallelism puts more
> pressure to the metrics bolt.
>
> I guess Taylor and Alessandro have been working on metrics v2. Unless we
> finish metrics v2, we can just reduce the load with metrics whitelist /
> blacklist, and asynchronous metrics consumer bolt on upcoming Storm 1.1.0.
> (Before that you might would like to give a try to migrate to 1.x, say,
> 1.0.2 for now.)
>
> - Jungtaek Lim (HeartSaVioR)
>
> 2017년 1월 5일 (목) 오전 12:42, Bobby Evans <ev...@yahoo-inc.com.invalid>님이 작성:
>
> > Yes you are right that will not help.  The best you can do now is to
> > increase the number of MetricsConsumer instances that you have.  You can
> do
> > this when you register the metrics consumer.
> > conf.registerMetricsConsumer(NoOpMetricsConsumer.class, 3);
> > The default is 1, but we have see with very large topologies, or ones
> that
> > output a lot of metrics they can sometimes get bogged down.
> > You could also try profiling that worker to see what is taking so long.
> > If a NoOp is also showing the same signs it would be interesting to see
> > why.  It could be the number of events coming in, or it could be the size
> > of the metrics being sent making deserialization costly. - Bobby
> >
> > On Tuesday, January 3, 2017 2:05 PM, Erik Weathers
> > <eweath...@groupon.com.INVALID> wrote:
> >
> >
> >  Thanks for the response Bobby!
> >
> > I think I might have failed to sufficiently emphasize & explain something
> > in my earlier description of the issue:  this is happening *only* in a
> > worker process that is hosting a bolt that implements the
> *IMetricsConsumer
> > *interface.  The other 24 worker processes are working just fine, their
> > netty queues do not grow forever.  The same number and type of executors
> > are on every worker process, except that one worker that is hosting the
> > metrics consumer bolt.
> >
> > So the netty queue is growing unbounded because of an influx of metrics.
> > The acking and max spout pending configs wouldn't seem to directly
> > influence the filling of the netty queue with custom metrics.
> >
> > Notably, this "choking" behavior happens even with a
> "NoOpMetricsConsumer"
> > bolt which is the same as storm's LoggingMetricsConsumer but with the
> > handleDataPoints() doing *nothing*.  Interesting, right?
> >
> > - Erik
> >
> > On Tue, Jan 3, 2017 at 7:06 AM, Bobby Evans <ev...@yahoo-inc.com.invalid
> >
> > wrote:
> >
> > > Storm does not have back pressure by default.  Also because storm
> > supports
> > > loops in a topology the message queues can grow unbounded.  We have put
> > in
> > > a number of fixes in newer versions of storm, also for the messaging
> side
> > > of things.  But the simplest way to avoid this is to have acking
> enabled
> > > and have max spout pending set to a reasonable number.  This will
> > typically
> > > be caused by one of the executors in your worker not being able to keep
> > up
> > > with the load coming in.  There is also the possibility that a single
> > > thread cannot keep up with the incoming  message load.  In the former
> > case
> > > you should be able to see the capacity go very high on some of the
> > > executors.  In the latter case you will not see that, and may need to
> add
> > > more workers to your topology.  - Bobby
> > >
> > >On Thursday, December 22, 2016 10:01 PM, Erik Weathers
> > > &l

Re: ever seen the netty message_queue grow (seemingly) infinitely?

2017-01-03 Thread Erik Weathers
Thanks for the response Bobby!

I think I might have failed to sufficiently emphasize & explain something
in my earlier description of the issue:  this is happening *only* in a
worker process that is hosting a bolt that implements the *IMetricsConsumer
*interface.  The other 24 worker processes are working just fine, their
netty queues do not grow forever.  The same number and type of executors
are on every worker process, except that one worker that is hosting the
metrics consumer bolt.

So the netty queue is growing unbounded because of an influx of metrics.
The acking and max spout pending configs wouldn't seem to directly
influence the filling of the netty queue with custom metrics.

Notably, this "choking" behavior happens even with a "NoOpMetricsConsumer"
bolt which is the same as storm's LoggingMetricsConsumer but with the
handleDataPoints() doing *nothing*.  Interesting, right?

- Erik

On Tue, Jan 3, 2017 at 7:06 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> Storm does not have back pressure by default.  Also because storm supports
> loops in a topology the message queues can grow unbounded.  We have put in
> a number of fixes in newer versions of storm, also for the messaging side
> of things.  But the simplest way to avoid this is to have acking enabled
> and have max spout pending set to a reasonable number.  This will typically
> be caused by one of the executors in your worker not being able to keep up
> with the load coming in.  There is also the possibility that a single
> thread cannot keep up with the incoming  message load.  In the former case
> you should be able to see the capacity go very high on some of the
> executors.  In the latter case you will not see that, and may need to add
> more workers to your topology.  - Bobby
>
> On Thursday, December 22, 2016 10:01 PM, Erik Weathers
> <eweath...@groupon.com.INVALID> wrote:
>
>
>  We're debugging a topology's infinite memory growth for a worker process
> that is running a metrics consumer bolt, and we just noticed that the netty
> Server.java's message_queue
> <https://github.com/apache/storm/blob/v0.9.6/storm-core/
> src/jvm/backtype/storm/messaging/netty/Server.java#L97>
> is growing forever (at least it goes up to ~5GB before it hits heap limits
> and leads to heavy GCing).  (We found this by using Eclipse's Memory
> Analysis Tool on a heap dump obtained via jmap.)
>
> We're running storm-0.9.6, and this is happening with a topology that is
> processing 200K+ tuples per second, and producing a lot of metrics.
>
> I'm a bit surprised that this queue would grow forever, I assumed there
> would be some sort of limit.  I'm pretty naive about how netty's message
> receiving system tied into the Storm executors at this point though.  I'm
> kind of assuming the behavior could be a result of backpressure / slowness
> from our downstream monitoring system, but there's no visibility provided
> by Storm into what's happening with these messages in the netty queues
> (that I have been able to ferret out at least!).
>
> Thanks for any input you might be able to provide!
>
> - Erik
>
>
>
>


ever seen the netty message_queue grow (seemingly) infinitely?

2016-12-22 Thread Erik Weathers
We're debugging a topology's infinite memory growth for a worker process
that is running a metrics consumer bolt, and we just noticed that the netty
Server.java's message_queue

is growing forever (at least it goes up to ~5GB before it hits heap limits
and leads to heavy GCing).  (We found this by using Eclipse's Memory
Analysis Tool on a heap dump obtained via jmap.)

We're running storm-0.9.6, and this is happening with a topology that is
processing 200K+ tuples per second, and producing a lot of metrics.

I'm a bit surprised that this queue would grow forever, I assumed there
would be some sort of limit.  I'm pretty naive about how netty's message
receiving system tied into the Storm executors at this point though.  I'm
kind of assuming the behavior could be a result of backpressure / slowness
from our downstream monitoring system, but there's no visibility provided
by Storm into what's happening with these messages in the netty queues
(that I have been able to ferret out at least!).

Thanks for any input you might be able to provide!

- Erik


Re: [DISCUSS] Feature Branch for Apache Beam Runner

2016-11-23 Thread Erik Weathers
Hugo,

This appears to be the branch:

   - https://github.com/apache/storm/tree/beam-runner

- Erik

On Wed, Nov 23, 2016 at 9:34 AM, Hugo Da Cruz Louro 
wrote:

> I somehow missed this email … I would like to contribute to this effort as
> well. Please keep me posted.
> Thanks.
>
> > On Oct 19, 2016, at 8:51 AM, Satish Duggana 
> wrote:
> >
> > +1, waiting for that. :)
> > Currently,there are API changes going on in Beam. It seem they plan to
> get
> > that done by the end of 2016.
> >
> > ~Satish.
> >
> > On Wed, Oct 19, 2016 at 9:19 PM, Bobby Evans  >
> > wrote:
> >
> >> +1 - Bobby
> >>
> >>On Wednesday, October 19, 2016 10:30 AM, Arun Mahadevan <
> >> ar...@apache.org> wrote:
> >>
> >>
> >> +1
> >>
> >> On 10/19/16, 8:58 PM, "P. Taylor Goetz"  wrote:
> >>
> >>> If there are no objections, I’d like to create the feature branch and
> >> push what I have so far. I’ve not had too much time lately to work on
> it,
> >> but other’s have expressed interest in contributing so I’d like to make
> it
> >> available.
> >>>
> >>> -Taylor
> >>>
> >>>
>  On Sep 19, 2016, at 11:15 AM, Bobby Evans  >
> >> wrote:
> 
>  +1 on the idea.  I would love to contribute, but I doubt I will find
> >> time to do it any time soon. - Bobby
> 
>    On Friday, September 16, 2016 12:05 AM, Satish Duggana <
> >> satish.dugg...@gmail.com> wrote:
> 
> 
>  Taylor,
>  I am interested in contributing to this effort. Gone through Beam APIs
>  earlier and had some initial thoughts on Storm runner. We can start
> with
>  existing core storm constructs but it is better to design in such a
> way
>  that these can be replaced with new APIs.
> 
>  Thanks,
>  Satish.
> 
>  On Fri, Sep 16, 2016 at 3:35 AM, P. Taylor Goetz 
> >> wrote:
> 
> > I'm open to change, but yes, I started with core storm since it
> offers
> >> the
> > most flexibility wrt how Beam constructs are translated.
> >
> > -Taylor
> >
> >> On Sep 15, 2016, at 5:51 PM, Roshan Naik 
> >> wrote:
> >>
> >> Good idea. Will the Beam API be implemented to run on top Storm Core
> >> primitives ?
> >> -roshan
> >>
> >>
> >>> On 9/15/16, 2:00 PM, "P. Taylor Goetz"  wrote:
> >>>
> >>> I¹ve been tinkering with implementing an Apache Beam runner on top
> of
> >>> Storm and would like to open it up so others in the community can
> >>> contribute. To that end I¹d like to propose creating a feature
> branch
> > for
> >>> that work if there are others who are interested in getting
> >> involved. We
> >>> did that a while back when storm-sql was originally developed.
> >>>
> >>> Basically, review requirements for that branch would be relaxed
> >> during
> >>> development, with a final, strict review before merging back to one
> >> of
> >>> our main branches.
> >>>
> >>> I¹d like to document what I have and future improvements in a
> >> proposal
> >>> document, and follow that with pushing the code to the feature
> branch
> > for
> >>> group collaboration.
> >>>
> >>> Any thoughts? Anyone interested in contributing to such an effort?
> >>>
> >>> -Taylor
> >>
> >
> 
> >>>
> >>
> >>
> >>
> >>
> >>
>
>


Re: Too many machine mails

2016-08-16 Thread Erik Weathers
 pro-getting-rid-of-github-messages-on-jira as well, but that's
> less
> > > annoying to me personally than the mails. It's also not clear what a
> > better
> > > solution for keeping jira and github "linked" is at this point.
> > > As far as what notifications come through, once it's on its own list I
> > > don't care if everything comes through.
> > > Do we need to call an official vote or something to actually get this
> > > moving? I'm not sure what the procedure is for setting up mailing
> lists.
> > >  -- Kyle
> > >
> > >On Thursday, August 11, 2016 9:18 AM, Jungtaek Lim <
> kabh...@gmail.com
> > >
> > > wrote:
> > >
> > >
> > >  First of all we need to define which things are annoying. Belows are
> > some
> > > which are mentioned one or more people,
> > >
> > > 1. Duplicated notifications per comment (You can receive 2 mails from
> > dev@
> > > + 1 mails from github up to condition (you're an author, you're
> watching,
> > > you're mentioned, etc) + occasionally 1 empty change mail from dev ->
> up
> > to
> > > 4 mails)
> > > 2. Copied comments from JIRA issue (with or without notification)
> > >
> > > and also need to define which things should be notified
> > >
> > > a. open pull request and close pull request
> > > b. only open pull request (linking github pull request and notified by
> > > changing status of issue - we can have 'patch available' status for
> that)
> > > c. no we should receive all of comments (just need to reduce duplicated
> > > things)
> > >
> > > - Jungtaek Lim (HeartSaVioR)
> > >
> > >
> > > 2016년 8월 11일 (목) 오후 10:52, Bobby Evans <ev...@yahoo-inc.com.invalid>님이
> > 작성:
> > >
> > >
> > > Yes lets have a separate firehouse/commit/whatever mailing list that if
> > > people really want all of that data they can see it all.  That way it
> is
> > > archived in ASF infra.  I do see value in having JIRA and GITHUB
> linked,
> > > I'm not sure if there is a better way to link the two right now though.
> > If
> > > someone does have experience with this type of thing and can make a
> > better
> > > solution I think we can talk to INFRA about adopting/supporting those
> > > changes. - Bobby
> > >
> > >On Thursday, August 11, 2016 8:41 AM, Aditya Desai <
> adity...@usc.edu>
> > > wrote:
> > >
> > >
> > >  Please reduce the number of emails. I am getting many many emails in
> > > recent
> > > days and spam my inbox.
> > >
> > > On Thu, Aug 11, 2016 at 2:41 AM, Erik Weathers <
> > > eweath...@groupon.com.invalid> wrote:
> > >
> > >
> > > I will state again (as I've done on prior email threads) that I find no
> > > value in spamming the JIRA issues like this, and that I strongly
> believe
> > > that this behavior is in fact detrimental since it obscures the actual
> > > comments on the issue itself.  The proposed solution of just moving the
> > > destination of the JIRA emails to a different list than
> > > dev@storm.apache.org
> > > doesn't solve that root problem.
> > >
> > > I want to be able to read a JIRA issue without having to skim over
> dozens
> > > and dozens of auto-appended code review messages.  I truly cannot
> > > understand why this isn't an annoyance for others.  I could be really
> > > snarky and reformat this email to have a bunch of random stuff in
> between
> > > every sentence to make my point, but I hope this sentence suffices to
> > >
> > > prove
> > >
> > > it?
> > >
> > > Though I must acknowledge your point Jungtaek  that there is some
> Apache
> > > policy that all code review comments need to be archived into some
> apache
> > > system.  Maybe we can use the attachment functionality of JIRA instead
> of
> > > making these separate comments on the JIRA issue?  I'm not sure how the
> > > integration is set up right now, that seems feasible.
> > >
> > > - Erik
> > >
> > > On Thu, Aug 11, 2016 at 2:08 AM, Matthias J. Sax <mj...@apache.org>
> > >
> > > wrote:
> > >
> > >
> > >
> > > I like the idea of have one more mailing list to reduce load on
> > >
> > >
> > > dev-list.
> > >
> > >
>

Re: Too many machine mails

2016-08-11 Thread Erik Weathers
It seems this plugin adds GitHub links into JIRA automagically:
https://help.github.com/articles/integrating-jira-with-your-projects/

I'm not sure if that's how it's set up already and this is what's resulting
in the review comments being added into JIRA.

- Erik

On Thu, Aug 11, 2016 at 12:08 PM, Kyle Nusbaum <knusb...@yahoo-inc.com.
invalid> wrote:

> I'm pro-getting-rid-of-github-messages-on-jira as well, but that's less
> annoying to me personally than the mails. It's also not clear what a better
> solution for keeping jira and github "linked" is at this point.
> As far as what notifications come through, once it's on its own list I
> don't care if everything comes through.
> Do we need to call an official vote or something to actually get this
> moving? I'm not sure what the procedure is for setting up mailing lists.
>  -- Kyle
>
> On Thursday, August 11, 2016 9:18 AM, Jungtaek Lim <kabh...@gmail.com>
> wrote:
>
>
>  First of all we need to define which things are annoying. Belows are some
> which are mentioned one or more people,
>
> 1. Duplicated notifications per comment (You can receive 2 mails from dev@
> + 1 mails from github up to condition (you're an author, you're watching,
> you're mentioned, etc) + occasionally 1 empty change mail from dev -> up to
> 4 mails)
> 2. Copied comments from JIRA issue (with or without notification)
>
> and also need to define which things should be notified
>
> a. open pull request and close pull request
> b. only open pull request (linking github pull request and notified by
> changing status of issue - we can have 'patch available' status for that)
> c. no we should receive all of comments (just need to reduce duplicated
> things)
>
> - Jungtaek Lim (HeartSaVioR)
>
>
> 2016년 8월 11일 (목) 오후 10:52, Bobby Evans <ev...@yahoo-inc.com.invalid>님이 작성:
>
> > Yes lets have a separate firehouse/commit/whatever mailing list that if
> > people really want all of that data they can see it all.  That way it is
> > archived in ASF infra.  I do see value in having JIRA and GITHUB linked,
> > I'm not sure if there is a better way to link the two right now though.
> If
> > someone does have experience with this type of thing and can make a
> better
> > solution I think we can talk to INFRA about adopting/supporting those
> > changes. - Bobby
> >
> >On Thursday, August 11, 2016 8:41 AM, Aditya Desai <adity...@usc.edu>
> > wrote:
> >
> >
> >  Please reduce the number of emails. I am getting many many emails in
> > recent
> > days and spam my inbox.
> >
> > On Thu, Aug 11, 2016 at 2:41 AM, Erik Weathers <
> > eweath...@groupon.com.invalid> wrote:
> >
> > > I will state again (as I've done on prior email threads) that I find no
> > > value in spamming the JIRA issues like this, and that I strongly
> believe
> > > that this behavior is in fact detrimental since it obscures the actual
> > > comments on the issue itself.  The proposed solution of just moving the
> > > destination of the JIRA emails to a different list than
> > > dev@storm.apache.org
> > > doesn't solve that root problem.
> > >
> > > I want to be able to read a JIRA issue without having to skim over
> dozens
> > > and dozens of auto-appended code review messages.  I truly cannot
> > > understand why this isn't an annoyance for others.  I could be really
> > > snarky and reformat this email to have a bunch of random stuff in
> between
> > > every sentence to make my point, but I hope this sentence suffices to
> > prove
> > > it?
> > >
> > > Though I must acknowledge your point Jungtaek  that there is some
> Apache
> > > policy that all code review comments need to be archived into some
> apache
> > > system.  Maybe we can use the attachment functionality of JIRA instead
> of
> > > making these separate comments on the JIRA issue?  I'm not sure how the
> > > integration is set up right now, that seems feasible.
> > >
> > > - Erik
> > >
> > > On Thu, Aug 11, 2016 at 2:08 AM, Matthias J. Sax <mj...@apache.org>
> > wrote:
> > >
> > > > I like the idea of have one more mailing list to reduce load on
> > dev-list.
> > > >
> > > > -Matthias
> > > >
> > > > On 08/11/2016 11:07 AM, Jungtaek Lim wrote:
> > > > > I remember that Taylor stated that all github comments should be
> > copied
> > > > to
> > > > > somewhere Apache infra, and it's Apache JIRA for us.
> > > > >

Re: Too many machine mails

2016-08-11 Thread Erik Weathers
I will state again (as I've done on prior email threads) that I find no
value in spamming the JIRA issues like this, and that I strongly believe
that this behavior is in fact detrimental since it obscures the actual
comments on the issue itself.  The proposed solution of just moving the
destination of the JIRA emails to a different list than dev@storm.apache.org
doesn't solve that root problem.

I want to be able to read a JIRA issue without having to skim over dozens
and dozens of auto-appended code review messages.  I truly cannot
understand why this isn't an annoyance for others.  I could be really
snarky and reformat this email to have a bunch of random stuff in between
every sentence to make my point, but I hope this sentence suffices to prove
it?

Though I must acknowledge your point Jungtaek  that there is some Apache
policy that all code review comments need to be archived into some apache
system.  Maybe we can use the attachment functionality of JIRA instead of
making these separate comments on the JIRA issue?  I'm not sure how the
integration is set up right now, that seems feasible.

- Erik

On Thu, Aug 11, 2016 at 2:08 AM, Matthias J. Sax  wrote:

> I like the idea of have one more mailing list to reduce load on dev-list.
>
> -Matthias
>
> On 08/11/2016 11:07 AM, Jungtaek Lim wrote:
> > I remember that Taylor stated that all github comments should be copied
> to
> > somewhere Apache infra, and it's Apache JIRA for us.
> >
> > It seems to make sense but I'm curious other projects respect this rule.
> I
> > also subscribed dev list of Kafka, Zeppelin, Flink, HBase, Spark
> (although
> > I barely see them) but no project is sending mail per each comment. Some
> of
> > them copy github comments to JIRA issue but no notification, and others
> > doesn't even copy comments to JIRA issue.
> > (You can check this with dev mailing list archive, too.)
> >
> > I'm in favor of reducing simple notification mails. Personally I saw most
> > of Storm dev. mails so I'm fine to keep mailing as it is (with some
> > annoying 'empty' notification), but it can also be done with watching
> > Github project.
> >
> > This is not raised for the first time, and I would like to discuss
> > seriously and see the changes.
> >
> > Thanks,
> > Jungtaek Lim (HeartSaVioR)
> >
> > 2016년 8월 11일 (목) 오후 2:22, Kyle Nusbaum  d>님이
> > 작성:
> >
> >> There seems to be a surplus of automatically-generated emails on the dev
> >> mailing list.
> >> Github and Apache's Jira constantly send mails to the dev list.
> >>
> >> I'm not sure that anyone finds these useful. Even if they do, I wonder
> if
> >> its better to move them to a separate list. It's possible that everyone
> has
> >> email filters employed to sort this out, but if every subscriber has the
> >> same filters employed, it might indicate the need for a separate list.
> --
> >> Kyle
> >
>
>


Re: Unable to run topologies

2016-07-29 Thread Erik Weathers
Supervisors don't open the ports.  The workers do.  The supervisors
*launch* the workers.

- Erik

On Fri, Jul 29, 2016 at 8:46 AM, Arjun Rao <sporty.ar...@gmail.com> wrote:

> The behavior is similar across any set of port ranges assigned as the
> supervisor slots ports. I tried with 6700-6703 and it's the same issue of
> address in use. I might be wrong but is it possible that the supervisor is
> not opening the ports as opposed to some other process using that port?
>
> Sent from my iPhone
>
> > On Jul 28, 2016, at 8:56 PM, Erik Weathers <eweath...@groupon.com.INVALID>
> wrote:
> >
> > I'm a bit confused as to what all of those cmds are showing / proving.
> >
> > But one thing I will point out is that you probably shouldn't be using
> > ports between 32768-61000 for your workers, because those ports are for
> > ephemeral usage, so could be used by another process randomly.  (That's
> the
> > default on linux at least.)
> >
> > - Erik
> >
> >> On Thu, Jul 28, 2016 at 5:47 PM, Arjun Rao <sporty.ar...@gmail.com>
> wrote:
> >>
> >> Thanks for the reply Erik. I ran nc -l 59027 on the supervisor host, but
> >> i think it is able to connect successfully. i ran the strace in any case
> >> and the output is attached in the file. I ran a couple of other
> commands as
> >> well and this is what i found.
> >>
> >> *With the supervisor running*
> >>
> >>
> >>
> >> nc -v devctsl001 59027
> >>
> >> nc: connect to devctsl001 port 59027 (tcp) failed: Connection refused
> >>
> >>
> >>
> >> telnet devctsl001 59027
> >>
> >> Trying 45.32.96.34...
> >>
> >> telnet: connect to address xx.xx.xx.xx: Connection refused
> >>
> >>
> >>
> >> nc -l 59027
> >>
> >> {No address already in use error. Connection seems to be open}
> >>
> >>
> >> *With the UI running ( the storm ui connects on 59031. The UI comes up
> >> successfully without any issues)*
> >>
> >>
> >>
> >> nc -v devctsl001 59031
> >>
> >> Connection to devctsl001 59031 port [tcp/*] succeeded!
> >>
> >>
> >> telnet devctsl001 59031
> >>
> >> Trying xx.xx.xx.xx...
> >>
> >> Connected to devctsl001.
> >>
> >> Escape character is '^]'.
> >>
> >>
> >> nc -l 59031
> >>
> >> nc: Address already in use
> >>
> >>
> >>
> >>
> >> Might be a red herring, but thought i'd share what i have done so far.
> >>
> >>
> >> Best,
> >>
> >> Arjun
> >>
> >> On Thu, Jul 28, 2016 at 7:35 PM, Erik Weathers <
> >> eweath...@groupon.com.invalid> wrote:
> >>
> >>> Somehow the OS is denying your application's request to create a
> socket.
> >>> Either the port really is bound to another process despite your netstat
> >>> cmd
> >>> not revealing that, or you are hitting some other limit.  The thread
> you
> >>> linked doesn't seem useful towards determining what your problem's root
> >>> cause is.
> >>>
> >>> I would run:  `nc -l 59027` in order to see if anything can bind to
> that
> >>> port.
> >>> Assuming it fails, then follow that up with an `strace nc -l 59027` to
> see
> >>> if there's any other evidence of why it's failing to bind.
> >>>
> >>> - Erik
> >>>
> >>> On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao <sporty.ar...@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> We are active users of storm in production. One of our pre-prod
> clusters
> >>>> however, is not functional at the moment. The storm daemons ( nimbus,
> >>> ui,
> >>>> logviewer, supervisor ) start up fine, but the storm workers are not
> get
> >>>> instantiated, when we submit topologies. We see the following error in
> >>> the
> >>>> worker logs:
> >>>>
> >>>> 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments.
> >>>> 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer
> >>>> transport plugin:backtype.storm.messaging.netty.Context
> >>>> 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching
> receive-thread
> >

Re: Unable to run topologies

2016-07-28 Thread Erik Weathers
I'm a bit confused as to what all of those cmds are showing / proving.

But one thing I will point out is that you probably shouldn't be using
ports between 32768-61000 for your workers, because those ports are for
ephemeral usage, so could be used by another process randomly.  (That's the
default on linux at least.)

- Erik

On Thu, Jul 28, 2016 at 5:47 PM, Arjun Rao <sporty.ar...@gmail.com> wrote:

> Thanks for the reply Erik. I ran nc -l 59027 on the supervisor host, but
> i think it is able to connect successfully. i ran the strace in any case
> and the output is attached in the file. I ran a couple of other commands as
> well and this is what i found.
>
> *With the supervisor running*
>
>
>
> nc -v devctsl001 59027
>
> nc: connect to devctsl001 port 59027 (tcp) failed: Connection refused
>
>
>
> telnet devctsl001 59027
>
> Trying 45.32.96.34...
>
> telnet: connect to address xx.xx.xx.xx: Connection refused
>
>
>
> nc -l 59027
>
> {No address already in use error. Connection seems to be open}
>
>
> *With the UI running ( the storm ui connects on 59031. The UI comes up
> successfully without any issues)*
>
>
>
> nc -v devctsl001 59031
>
> Connection to devctsl001 59031 port [tcp/*] succeeded!
>
>
> telnet devctsl001 59031
>
> Trying xx.xx.xx.xx...
>
> Connected to devctsl001.
>
> Escape character is '^]'.
>
>
> nc -l 59031
>
> nc: Address already in use
>
>
>
>
> Might be a red herring, but thought i'd share what i have done so far.
>
>
> Best,
>
> Arjun
>
> On Thu, Jul 28, 2016 at 7:35 PM, Erik Weathers <
> eweath...@groupon.com.invalid> wrote:
>
>> Somehow the OS is denying your application's request to create a socket.
>> Either the port really is bound to another process despite your netstat
>> cmd
>> not revealing that, or you are hitting some other limit.  The thread you
>> linked doesn't seem useful towards determining what your problem's root
>> cause is.
>>
>> I would run:  `nc -l 59027` in order to see if anything can bind to that
>> port.
>> Assuming it fails, then follow that up with an `strace nc -l 59027` to see
>> if there's any other evidence of why it's failing to bind.
>>
>> - Erik
>>
>> On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao <sporty.ar...@gmail.com>
>> wrote:
>>
>> > Hi all,
>> >
>> > We are active users of storm in production. One of our pre-prod clusters
>> > however, is not functional at the moment. The storm daemons ( nimbus,
>> ui,
>> > logviewer, supervisor ) start up fine, but the storm workers are not get
>> > instantiated, when we submit topologies. We see the following error in
>> the
>> > worker logs:
>> >
>> > 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments.
>> > 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer
>> > transport plugin:backtype.storm.messaging.netty.Context
>> > 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching receive-thread
>> > for b4560ed4-d257-4151-9764-633707282a1f:59027
>> > 2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server
>> > Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1
>> > 2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on initialization
>> of
>> > server mk-worker
>> > org.apache.storm.netty.channel.ChannelException: Failed to bind to:
>> > 0.0.0.0/0.0.0.0:59027
>> > at
>> >
>> org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> > at backtype.storm.messaging.netty.Server.(Server.java:130)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> > at backtype.storm.messaging.netty.Context.bind(Context.java:73)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> > at
>> >
>> backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> > at clojure.lang.RestFn.invoke(RestFn.java:668)
>> > [clojure-1.5.1.jar:na]
>> > at
>> >
>> backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> > at
>> >
>> backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto4630.invoke(worker.clj:415)
>> > ~[storm-core-0.9.6.jar:0.9.6]
>> > at clojure.lang.AFn.applyToHelper(AFn.java:185)
>> > [clojure-1.5.1.jar:na]
>> > at clojure.lang.AFn.applyTo(AFn.

Re: Unable to run topologies

2016-07-28 Thread Erik Weathers
Somehow the OS is denying your application's request to create a socket.
Either the port really is bound to another process despite your netstat cmd
not revealing that, or you are hitting some other limit.  The thread you
linked doesn't seem useful towards determining what your problem's root
cause is.

I would run:  `nc -l 59027` in order to see if anything can bind to that
port.
Assuming it fails, then follow that up with an `strace nc -l 59027` to see
if there's any other evidence of why it's failing to bind.

- Erik

On Thu, Jul 28, 2016 at 3:46 PM, Arjun Rao  wrote:

> Hi all,
>
> We are active users of storm in production. One of our pre-prod clusters
> however, is not functional at the moment. The storm daemons ( nimbus, ui,
> logviewer, supervisor ) start up fine, but the storm workers are not get
> instantiated, when we submit topologies. We see the following error in the
> worker logs:
>
> 2016-07-28 18:33:59 [main] b.s.d.worker [INFO] Reading Assignments.
> 2016-07-28 18:34:00 [main] b.s.m.TransportFactory [INFO] Storm peer
> transport plugin:backtype.storm.messaging.netty.Context
> 2016-07-28 18:34:00 [main] b.s.d.worker [INFO] Launching receive-thread
> for b4560ed4-d257-4151-9764-633707282a1f:59027
> 2016-07-28 18:34:00 [main] b.s.m.n.Server [INFO] Create Netty Server
> Netty-server-localhost-59027, buffer_size: 5242880, maxWorkers: 1
> 2016-07-28 18:34:00 [main] b.s.d.worker [ERROR] Error on initialization of
> server mk-worker
> org.apache.storm.netty.channel.ChannelException: Failed to bind to:
> 0.0.0.0/0.0.0.0:59027
> at
> org.apache.storm.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
> ~[storm-core-0.9.6.jar:0.9.6]
> at backtype.storm.messaging.netty.Server.(Server.java:130)
> ~[storm-core-0.9.6.jar:0.9.6]
> at backtype.storm.messaging.netty.Context.bind(Context.java:73)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> backtype.storm.messaging.loader$launch_receive_thread_BANG_.doInvoke(loader.clj:68)
> ~[storm-core-0.9.6.jar:0.9.6]
> at clojure.lang.RestFn.invoke(RestFn.java:668)
> [clojure-1.5.1.jar:na]
> at
> backtype.storm.daemon.worker$launch_receive_thread.invoke(worker.clj:380)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> backtype.storm.daemon.worker$fn__4629$exec_fn__1104__auto4630.invoke(worker.clj:415)
> ~[storm-core-0.9.6.jar:0.9.6]
> at clojure.lang.AFn.applyToHelper(AFn.java:185)
> [clojure-1.5.1.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
> at clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]
> at
> backtype.storm.daemon.worker$fn__4629$mk_worker__4685.doInvoke(worker.clj:393)
> [storm-core-0.9.6.jar:0.9.6]
> at clojure.lang.RestFn.invoke(RestFn.java:512)
> [clojure-1.5.1.jar:na]
> at backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
> [storm-core-0.9.6.jar:0.9.6]
> at clojure.lang.AFn.applyToHelper(AFn.java:172)
> [clojure-1.5.1.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
> at backtype.storm.daemon.worker.main(Unknown Source)
> [storm-core-0.9.6.jar:0.9.6]
> java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind0(Native Method) ~[na:1.8.0_45]
> at sun.nio.ch.Net.bind(Net.java:437) ~[na:1.8.0_45]
> at sun.nio.ch.Net.bind(Net.java:429) ~[na:1.8.0_45]
> at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> ~[na:1.8.0_45]
> at
> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> ~[na:1.8.0_45]
> at
> org.apache.storm.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:372)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:296)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> org.apache.storm.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> ~[storm-core-0.9.6.jar:0.9.6]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_45]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> ~[na:1.8.0_45]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_45]
> 2016-07-28 18:34:00 [main] b.s.util [ERROR] Halting process: ("Error on
> initialization")
> java.lang.RuntimeException: ("Error on initialization")
> at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
> 

Re: Nimbus runnig for short time only

2016-07-07 Thread Erik Weathers
There is a problem with your ZooKeeper server(s). That is something Nimbus
relies upon, so you must get that working before you should worry about
Nimbus.

- Erik

On Thursday, July 7, 2016, Walid Aljoby  wrote:

> Hi all,
>
> I have an issue when start running storm nimbus. It runs only for short
> time then disappear.
> Also no reply for this command: echo stat | nc zkserver1 2181
>
> This is snapshot from nimbus log file:
>
> java.lang.RuntimeException:
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /storm
> at org.apache.storm.util$wrap_in_runtime.invoke(util.clj:54)
> at
> org.apache.storm.zookeeper$exists_node_QMARK_$fn__2091.invoke(zookeeper.clj:108)
> at
> org.apache.storm.zookeeper$exists_node_QMARK_.invoke(zookeeper.clj:104)
> at org.apache.storm.zookeeper$mkdirs.invoke(zookeeper.clj:124)
> at
> org.apache.storm.cluster_state.zookeeper_state_factory$_mkState.invoke(zookeeper_state_factory.clj:29)
> at
> org.apache.storm.cluster_state.zookeeper_state_factory.mkState(Unknown
> Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
> at
> org.apache.storm.cluster$mk_distributed_cluster_state.doInvoke(cluster.clj:46)
> at clojure.lang.RestFn.invoke(RestFn.java:559)
> at
> org.apache.storm.cluster$mk_storm_cluster_state.doInvoke(cluster.clj:250)
> at clojure.lang.RestFn.invoke(RestFn.java:486)
> at org.apache.storm.daemon.nimbus$nimbus_data.invoke(nimbus.clj:175)
> at
> org.apache.storm.daemon.nimbus$fn__7064$exec_fn__2461__auto7065.invoke(nimbus.clj:1361)
> at clojure.lang.AFn.applyToHelper(AFn.java:156)
> at clojure.lang.AFn.applyTo(AFn.java:144)
> at clojure.core$apply.invoke(core.clj:630)
> at
> org.apache.storm.daemon.nimbus$fn__7064$service_handler__7308.doInvoke(nimbus.clj:1358)
> at clojure.lang.RestFn.invoke(RestFn.java:421)
> at
> org.apache.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:2206)
> at org.apache.storm.daemon.nimbus$_launch.invoke(nimbus.clj:2239)
> at org.apache.storm.daemon.nimbus$_main.invoke(nimbus.clj:2262)
> at clojure.lang.AFn.applyToHelper(AFn.java:152)
> at clojure.lang.AFn.applyTo(AFn.java:144)
> at org.apache.storm.daemon.nimbus.main(Unknown Source)
> Caused by:
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$ConnectionLossException:
> KeeperErrorCode = ConnectionLoss for /storm
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
> at
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1045)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:226)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.ExistsBuilderImpl$3.call(ExistsBuilderImpl.java:215)
> at
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForegroundStandard(ExistsBuilderImpl.java:212)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:205)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:168)
> at
> org.apache.storm.shade.org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:39)
> at
> org.apache.storm.zookeeper$exists_node_QMARK_$fn__2091.invoke(zookeeper.clj:107)
> ... 27 more
> 2016-07-07 15:44:30.291 o.a.s.util [ERROR] Halting process: ("Error on
> initialization")
> java.lang.RuntimeException: ("Error on initialization")
> at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
> at clojure.lang.RestFn.invoke(RestFn.java:423)
> at
> org.apache.storm.daemon.nimbus$fn__7064$service_handler__7308.doInvoke(nimbus.clj:1358)
> at clojure.lang.RestFn.invoke(RestFn.java:421)
> at
> org.apache.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:2206)
> at org.apache.storm.daemon.nimbus$_launch.invoke(nimbus.clj:2239)
> at org.apache.storm.daemon.nimbus$_main.invoke(nimbus.clj:2262)
> at clojure.lang.AFn.applyToHelper(AFn.java:152)
> at clojure.lang.AFn.applyTo(AFn.java:144)
> at org.apache.storm.daemon.nimbus.main(Unknown Source)
>
>
> Many thanks for your 

Re: Updates keywords automatically

2016-06-21 Thread Erik Weathers
You'll definitely need to explain more about what you *mean* by automatic
vs. manual.  Storm is just the framework that is running your code, this
*sounds* like a problem in your own code.

On Mon, Jun 20, 2016 at 10:22 PM, Rahul mahamna  wrote:

> Hi
> I am working on apache storm to analyse real time tweets and other stuff.
> I am having a problem in making it automatic.
> I am able to perform operation on real time tweets manually using storm
> but i am looking for something where i can make it automatic in storm
> itself. Whenever i am filtering a query using a particular keyword i have
> to make jar again and again to the right tweet. So if i will make it
> automatic then i have to make jar again and again for all the keywords
> which will take time and consume my base memory. So can you suggest me an
> option where i can directly insert my keyword to the same jar again and
> again to get real time tweets in no time. Or if you have any other option
> then please suggest me. Please reply
>
> thank you
> --
>
>
> IMPORTANT: This message may contain privileged and confidential information
> that is the property of the intended recipient. If you are not the intended
> recipient, you should not disclose or use the information contained in
> it. If you have received this email in error, please notify us immediately
> by return email and delete the document. Copying or disseminating any of
> this message is prohibited. Any views expressed in this message are those
> of the individual sender and may not necessarily reflect the views of
> Appster.com.au  unless indicated otherwise. Before
> opening or using attachments check them for viruses and defects.
>


Re: STORM JIRA being spammed - CANNOT comment on JIRA (as a result of the counter spam measures)

2016-05-16 Thread Erik Weathers
:_(   I was happy for a second that the comments weren't getting added to
JIRA.  I still have not seen a convincing argument for duplicating code
review comments into the JIRA issues.

- Erik

On Mon, May 16, 2016 at 11:33 AM, P. Taylor Goetz  wrote:

> I worked with INFRA and we were able to resolve this. Unfortunately
> comments made on github before the fix won’t be replicated to JIRA so we’ll
> have to look at both JIRA and github for recent comments on issues.
>
> -Taylor
>
> > On May 16, 2016, at 1:48 PM, P. Taylor Goetz  wrote:
> >
> > Thanks for the heads up Abhishek. I’ll work with INFRA to see if we can
> get this resolved as it is kind of disruptive.
> >
> > -Taylor
> >
> >> On May 16, 2016, at 1:18 PM, Abhishek Agarwal 
> wrote:
> >>
> >> @Taylor - Looks like it didn't work. I don't see github comments getting
> >> published to the JIRA.  Example
> >> https://issues.apache.org/jira/browse/STORM-1755
> >> https://github.com/apache/storm/pull/1386#issuecomment-217697370 (This
> >> didn't get published to the JIRA)
> >>
> >> On Fri, May 13, 2016 at 9:42 PM, P. Taylor Goetz 
> wrote:
> >>
> >>> I added ASF Github Bot to the list of contributors. Hopefully that will
> >>> fix the issue.
> >>>
> >>> On May 13, 2016, at 12:10 PM, P. Taylor Goetz 
> wrote:
> >>>
> >>>
> >>> On May 13, 2016, at 4:11 AM, Jungtaek Lim  wrote:
> >>>
> >>> Now github comments are not automatically posted to JIRA, which is less
> >>> duplicate and more quiet.
> >>>
> >>>
> >>> I think this is due to the spam countermeasures, let me see if I can
> fix
> >>> this. This is something we want, since we also want to capture
> comments,
> >>> especially +1s from committers.
> >>>
> >>> -Taylor
> >>>
> >>>
> >>>
> >>
> >>
> >> --
> >> Regards,
> >> Abhishek Agarwal
> >
>
>


[jira] [Commented] (STORM-1839) Kinesis Spout

2016-05-16 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15285208#comment-15285208
 ] 

Erik Weathers commented on STORM-1839:
--

Thanks [~sriharsha]!

> Kinesis Spout
> -
>
> Key: STORM-1839
> URL: https://issues.apache.org/jira/browse/STORM-1839
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Sriharsha Chintalapani
>Assignee: Priyank Shah
>
> As Storm is increasingly used in Cloud environments. It will great to have a 
> Kinesis Spout integration in Apache Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: does anyone else hate the verbose logging of all PR comments in the Storm JIRAs?

2016-05-15 Thread Erik Weathers
Yay, I have one convert to my perspective! ;-)
The emails are less annoying to me than the dilution / hiding of manual
comments within the JIRA tickets.

- Erik

On Sun, May 15, 2016 at 7:24 PM, Jungtaek Lim <kabh...@gmail.com> wrote:

> Moving opinions from other thread,
>
> Abhishek
>
> 3* - I also get one extra email from notificati...@github.com if I am
> participating in a pull request from notificati...@github.com. It will be
> great to avoid that as well. By the way, removing notifications from Github
> means that PRs with no JIRA id might go unnoticed for long time.
>
> Harsha
>
> -1 to what Abhishek said. notifications important for everyone else. if you
> are getting
> spammed by this create a mail rule.
>
> Aaron
>
> 3* Agree that limiting duplicate emails would be good.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 5월 16일 (월) 오전 11:23, Jungtaek Lim <kabh...@gmail.com>님이 작성:
>
> > This thread was forgotten, and I think I was wrong since I also hate the
> > verbose logging now :)
> >
> > I'm clicking two mails for one new comment from github, which is really
> > bad.
> > I strongly agree with Erik, and linking github PR with JIRA is enough for
> > me.
> >
> > 2015년 9월 22일 (화) 오후 6:06, 임정택 <kabh...@gmail.com>님이 작성:
> >
> >> Matthias,
> >> Personally I can't turn off github notification because of I'm also
> >> collaborator of Jedis, which repository is belong to Github. ;(
> >>
> >> I just wish to reduce notifications via same event (Github event &
> >> JIRA automatic posting) from dev. mailing list.
> >> Actually I don't have strong opinion about this. Just a wish. :)
> >>
> >>
> >> 2015-09-22 17:23 GMT+09:00 Matthias J. Sax <mj...@apache.org>:
> >>
> >>> On Github, you can disable mail notification about each comment in your
> >>> profile configuration (at least for you personal email address -- I
> >>> guess it still goes over the mailing list)
> >>>
> >>> Profile -> Settings -> Notification Center
> >>>
> >>> -Matthias
> >>>
> >>> On 09/22/2015 03:21 AM, Erik Weathers wrote:
> >>> > Sure, STORM-*.  ;-)
> >>> >
> >>> > Here's a good example:
> >>> >
> >>> >- https://issues.apache.org/jira/browse/STORM-329
> >>> >
> >>> > Compare that to this one:
> >>> >
> >>> >- https://issues.apache.org/jira/browse/STORM-404
> >>> >
> >>> > STORM-404 has a bunch of human-created comments, but it's readable
> >>> since it
> >>> > has no github-generated comments.  STORM-329 however intermixes the
> >>> human
> >>> > comments with the github ones.  It's really hard to read through.
> >>> >
> >>> > To be clear, it's not that it's *confusing* per se -- it's that the
> >>> > behavior is *cluttering* the comments, making it harder to see any
> >>> > human-created comments since any JIRA issue with a PR will usually
> end
> >>> up
> >>> > with many automated comments.
> >>> >
> >>> > BTW, I totally agree that linking from the JIRA issue to the github
> PR
> >>> is
> >>> > important!  Would be even nicer if the github PRs also directly
> linked
> >>> back
> >>> > to the JIRA issue with a clickable link.
> >>> >
> >>> > - Erik
> >>> >
> >>> > On Mon, Sep 21, 2015 at 6:03 PM, 임정택 <kabh...@gmail.com> wrote:
> >>> >
> >>> >> Hi Erik,
> >>> >>
> >>> >> I think verbose logging of PR comments could be OK. I didn't
> >>> experience any
> >>> >> confusing.
> >>> >> Maybe referring sample JIRA issues could help us to understand.
> >>> >>
> >>> >> But I'm also open to change cause other projects already have been
> >>> doing.
> >>> >> (for example, https://issues.apache.org/jira/browse/SPARK-10474)
> >>> >>
> >>> >> In addition to SPARK has been doing, I'd like to still leave some
> >>> events on
> >>> >> github PR to JIRA issue, too.
> >>> >>
> >>> >> Btw, the thing I'm really annoyed is multiple mail notifications on
> >>> each
> >>> >> github comment.
> >>> >>
&g

[jira] [Updated] (STORM-1766) A better algorithm server rack selection for RAS

2016-05-04 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1766:
-
Description: 
Currently the getBestClustering algorithm for RAS finds the "Best" cluster/rack 
based on which rack has the most available resources this may be insufficient 
and may cause topologies not to be able to be scheduled successfully even 
though there are enough resources to schedule it in the cluster. We attempt to 
find the rack with the most resources by find the rack with the biggest sum of 
available memory + available cpu. This method is not effective since it does 
not consider the number of slots available. This method also fails in 
identifying racks that are not schedulable due to the exhaustion of one of the 
resources either memory, cpu, or slots. The current implementation also tries 
the initial scheduling on one rack and not try to schedule on all the racks 
before giving up which may cause topologies to be failed to be scheduled due to 
the above mentioned shortcomings in the current method. Also the current method 
does not consider failures of workers. When executors of a topology gets 
unassigned and needs to be scheduled again, the current logic in 
getBestClustering may be inadequate if not complete wrong. When executors needs 
to rescheduled due to a fault, getBestClustering will likely return a cluster 
that is different from where the majority of executors from the topology is 
originally scheduling in.

Thus, I propose a different strategy/algorithm to find the "best" cluster. I 
have come up with a ordering strategy I dub subordinate resource availability 
ordering (inspired by Dominant Resource Fairness) that sorts racks by the 
subordinate (not dominant) resource availability.

For example given 4 racks with the following resource availabilities
{code}
//generate some that has alot of memory but little of cpu
rack-3 Avail [ CPU 100.0 MEM 20.0 Slots 40 ] Total [ CPU 100.0 MEM 20.0 
Slots 40 ]
//generate some supervisors that are depleted of one resource
rack-2 Avail [ CPU 0.0 MEM 8.0 Slots 40 ] Total [ CPU 0.0 MEM 8.0 Slots 
40 ]
//generate some that has a lot of cpu but little of memory
rack-4 Avail [ CPU 6100.0 MEM 1.0 Slots 40 ] Total [ CPU 6100.0 MEM 1.0 
Slots 40 ]
//generate another rack of supervisors with less resources than rack-0
rack-1 Avail [ CPU 2000.0 MEM 4.0 Slots 40 ] Total [ CPU 2000.0 MEM 4.0 
Slots 40 ]
rack-0 Avail [ CPU 4000.0 MEM 8.0 Slots 40( ] Total [ CPU 4000.0 MEM 
8.0 Slots 40 ]
Cluster Overall Avail [ CPU 12200.0 MEM 41.0 Slots 200 ] Total [ CPU 
12200.0 MEM 41.0 Slots 200 ]
{code}

It is clear that rack-0 is the best cluster since its the most balanced and can 
potentially schedule the most executors, while rack-2 is the worst rack since 
rack-2 is depleted of cpu resource thus rendering it unschedulable even though 
there are other resources available.

We first calculate the resource availability percentage of all the racks for 
each resource by computing:
{code}
(resource available on rack) / (resource available in cluster)
{code}

We do this calculation to normalize the values otherwise the resource values 
would not be comparable.

So for our example:
{code}
rack-3 Avail [ CPU 0.819672131147541% MEM 48.78048780487805% Slots 20.0% ] 
effective resources: 0.00819672131147541
rack-2 Avail [ 0.0% MEM 19.51219512195122% Slots 20.0% ] effective resources: 
0.0
rack-4 Avail [ CPU 50.0% MEM 2.4390243902439024% Slots 20.0% ] effective 
resources: 0.024390243902439025
rack-1 Avail [ CPU 16.39344262295082% MEM 9.75609756097561% Slots 20.0% ] 
effective resources: 0.0975609756097561
rack-0 Avail [ CPU 32.78688524590164% MEM 19.51219512195122% Slots 20.0% ] 
effective resources: 0.1951219512195122
{code}

The effective resource of a rack, which is also the subordinate resource, is 
computed by: 
{code}
MIN(resource availability percentage of {CPU, Memory, # of free Slots}).
{code}
Then we order the racks by the effective resource.

Thus for our example:
{code}
Sorted rack: [rack-0, rack-1, rack-4, rack-3, rack-2]
{code}
Also to deal with the presence of failures, if a topology is partially 
scheduled, we find the rack with the most scheduled executors for the topology 
and we try to schedule on that rack first.

Thus for the sorting for racks. We first sort by the number of executors 
already scheduled on the rack and then by the subordinate resource availability.

  was:
Currently the getBestClustering algorithm for RAS finds the "Best" cluster/rack 
based on which rack has the most available resources this may be insufficient 
and may cause topologies not to be able to be scheduled successfully even 
though there are enough resources to schedule it in the cluster. We attempt to 
find the rack with the most resources by find the rack with the biggest sum of 
available memor

[jira] [Updated] (STORM-1766) A better algorithm server rack selection for RAS

2016-05-04 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1766:
-
Description: 
Currently the getBestClustering algorithm for RAS finds the "Best" cluster/rack 
based on which rack has the most available resources this may be insufficient 
and may cause topologies not to be able to be scheduled successfully even 
though there are enough resources to schedule it in the cluster. We attempt to 
find the rack with the most resources by find the rack with the biggest sum of 
available memory + available cpu. This method is not effective since it does 
not consider the number of slots available. This method also fails in 
identifying racks that are not schedulable due to the exhaustion of one of the 
resources either memory, cpu, or slots. The current implementation also tries 
the initial scheduling on one rack and not try to schedule on all the racks 
before giving up which may cause topologies to be failed to be scheduled due to 
the above mentioned shortcomings in the current method. Also the current method 
does not consider failures of workers. When executors of a topology gets 
unassigned and needs to be scheduled again, the current logic in 
getBestClustering may be inadequate if not complete wrong. When executors needs 
to rescheduled due to a fault, getBestClustering will likely return a cluster 
that is different from where the majority of executors from the topology is 
originally scheduling in.

Thus, I propose a different strategy/algorithm to find the "best" cluster. I 
have come up with a ordering strategy I dub subordinate resource availability 
ordering (inspired by Dominant Resource Fairness) that sorts racks by the 
subordinate (not dominant) resource availability.

For example given 4 racks with the following resource availabilities
{code}
//generate some that has alot of memory but little of cpu
rack-3 Avail [ CPU 100.0 MEM 20.0 Slots 40 ] Total [ CPU 100.0 MEM 20.0 
Slots 40 ]
//generate some supervisors that are depleted of one resource
rack-2 Avail [ CPU 0.0 MEM 8.0 Slots 40 ] Total [ CPU 0.0 MEM 8.0 Slots 
40 ]
//generate some that has a lot of cpu but little of memory
rack-4 Avail [ CPU 6100.0 MEM 1.0 Slots 40 ] Total [ CPU 6100.0 MEM 1.0 
Slots 40 ]
//generate another rack of supervisors with less resources than rack-0
rack-1 Avail [ CPU 2000.0 MEM 4.0 Slots 40 ] Total [ CPU 2000.0 MEM 4.0 
Slots 40 ]
rack-0 Avail [ CPU 4000.0 MEM 8.0 Slots 40( ] Total [ CPU 4000.0 MEM 
8.0 Slots 40 ]
Cluster Overall Avail [ CPU 12200.0 MEM 41.0 Slots 200 ] Total [ CPU 
12200.0 MEM 41.0 Slots 200 ]
{code}

It is clear that rack-0 is the best cluster since its the most balanced and can 
potentially schedule the most executors, while rack-2 is the worst rack since 
rack-2 is depleted of cpu resource thus rendering it unschedulable even though 
there are other resources available.

We first calculate the resource availability percentage of all the racks for 
each resource by computing: (resource available on rack) / (resource available 
in cluster)

We do this calculation to normalize the values otherwise the resource values 
would not be comparable.

So for our example:
{code}
rack-3 Avail [ CPU 0.819672131147541% MEM 48.78048780487805% Slots 20.0% ] 
effective resources: 0.00819672131147541
rack-2 Avail [ 0.0% MEM 19.51219512195122% Slots 20.0% ] effective resources: 
0.0
rack-4 Avail [ CPU 50.0% MEM 2.4390243902439024% Slots 20.0% ] effective 
resources: 0.024390243902439025
rack-1 Avail [ CPU 16.39344262295082% MEM 9.75609756097561% Slots 20.0% ] 
effective resources: 0.0975609756097561
rack-0 Avail [ CPU 32.78688524590164% MEM 19.51219512195122% Slots 20.0% ] 
effective resources: 0.1951219512195122
{code}

The effective resource of a rack, which is also the subordinate resource, is 
computed by: 
{code}
MIN(resource availability percentage of {CPU, Memory, # of free Slots}).
{code}
Then we order the racks by the effective resource.

Thus for our example:
{code}
Sorted rack: [rack-0, rack-1, rack-4, rack-3, rack-2]
{code}
Also to deal with the presence of failures, if a topology is partially 
scheduled, we find the rack with the most scheduled executors for the topology 
and we try to schedule on that rack first.

Thus for the sorting for racks. We first sort by the number of executors 
already scheduled on the rack and then by the subordinate resource availability.

  was:
Currently the getBestClustering algorithm for RAS finds the "Best" cluster/rack 
based on which rack has the most available resources this may be insufficient 
and may cause topologies not to be able to be scheduled successfully even 
though there are enough resources to schedule it in the cluster. We attempt to 
find the rack with the most resources by find the rack with the biggest sum of 
available memor

[jira] [Resolved] (STORM-143) Launching a process throws away standard out; can hang

2016-04-27 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers resolved STORM-143.
-
   Resolution: Fixed
Fix Version/s: 0.10.0

> Launching a process throws away standard out; can hang
> --
>
> Key: STORM-143
> URL: https://issues.apache.org/jira/browse/STORM-143
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Reporter: James Xu
>Priority: Minor
> Fix For: 0.10.0
>
>
> https://github.com/nathanmarz/storm/issues/489
> https://github.com/nathanmarz/storm/blob/master/src/clj/backtype/storm/util.clj#L349
> When we launch a process, standard out is written to a system buffer and does 
> not appear to be read. Also, nothing is redirected to standard in. This can 
> have the following effects:
> A worker can hang when initializing (e.g. UnsatisfiedLinkError looking for 
> jzmq), and it will be unable to communicate the error as standard out is 
> being swallowed.
> A process that writes too much to standard out will block if the buffer fills
> A process that tries to read form standard in for any reason will block.
> Perhaps we can redirect standard out to an .out file, and redirect /dev/null 
> to the standard in stream of the process?
> --
> nathanmarz: Storm redirects stdout to the logging system. It's worked fine 
> for us in our topologies.
> --
> d2r: We see in worker.clj, in mk-worker, where there is a call to 
> redirect-stdio-to-slf4j!. This would not seem to help in cases such as we are 
> seeing when there is a problem launching the worker itself.
> (defn -main [storm-id assignment-id port-str worker-id]
>   (let [conf1 (read-storm-config)
> login_conf_file (System/getProperty "java.security.auth.login.config")
> conf (if login_conf_file (merge conf1 
> {"java.security.auth.login.config" login_conf_file}) conf1)]
> (validate-distributed-mode! conf)
> (mk-worker conf nil (java.net.URLDecoder/decode storm-id) assignment-id 
> (Integer/parseInt port-str) worker-id)))
> If anything were to go wrong (CLASSPATH, jvm opts, misconfiguration...) 
> before -main or before mk-worker, then any output would be lost. The symptom 
> we saw was that the topology sat around apparently doing nothing, yet there 
> was no log indicating that the workers were failing to start.
> Is there other redirection to logs that I'm missing?
> --
> xiaokang: we use bash to launch worker process and redirect its stdout to 
> woker-port.out file. it heleped us find the zeromq jni problem that cause the 
> jvm crash without any log.
> --
> nathanmarz: @d2r Yea, that's all I was referring to. If we redirect stdout, 
> will the code that redirects stdout to the logging system still take effect? 
> This is important because we can control the size of the logfiles (via the 
> logback config) but not the size of the redirected stdout file.
> --
> d2r: My hunch is that it will work as it does now, except that any messages 
> that are getting thrown away before that point would go to a file instead. I 
> can play with it and find out. We wouldn't want to change the redirection, 
> just restore visibility to any output that might occur prior to the 
> redirection. There should be some safety valve to control the size of any new 
> .out in case something goes berserk.
> @xiaokang I see how that would work. We also need to make sure redirection 
> continues to work as it currently does for the above reason.
> --
> xiaokang: @d2r @nathanmarz In out cluster, storm's stdout redirection still 
> works for any System.out output while JNI errors goes to worker-port.out 
> file. I think it will be nice to use the same worker-port.log file for bash 
> stdout redirection since logback can control log file size. But it is a 
> little bit ugly to use bash to launch worker java process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-143) Launching a process throws away standard out; can hang

2016-04-27 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261323#comment-15261323
 ] 

Erik Weathers commented on STORM-143:
-

Aha (hadn't clicked the {{...}} on the GitHub UI)!  I'll mark this ticket as 
closed then, thanks!

> Launching a process throws away standard out; can hang
> --
>
> Key: STORM-143
> URL: https://issues.apache.org/jira/browse/STORM-143
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Reporter: James Xu
>Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/489
> https://github.com/nathanmarz/storm/blob/master/src/clj/backtype/storm/util.clj#L349
> When we launch a process, standard out is written to a system buffer and does 
> not appear to be read. Also, nothing is redirected to standard in. This can 
> have the following effects:
> A worker can hang when initializing (e.g. UnsatisfiedLinkError looking for 
> jzmq), and it will be unable to communicate the error as standard out is 
> being swallowed.
> A process that writes too much to standard out will block if the buffer fills
> A process that tries to read form standard in for any reason will block.
> Perhaps we can redirect standard out to an .out file, and redirect /dev/null 
> to the standard in stream of the process?
> --
> nathanmarz: Storm redirects stdout to the logging system. It's worked fine 
> for us in our topologies.
> --
> d2r: We see in worker.clj, in mk-worker, where there is a call to 
> redirect-stdio-to-slf4j!. This would not seem to help in cases such as we are 
> seeing when there is a problem launching the worker itself.
> (defn -main [storm-id assignment-id port-str worker-id]
>   (let [conf1 (read-storm-config)
> login_conf_file (System/getProperty "java.security.auth.login.config")
> conf (if login_conf_file (merge conf1 
> {"java.security.auth.login.config" login_conf_file}) conf1)]
> (validate-distributed-mode! conf)
> (mk-worker conf nil (java.net.URLDecoder/decode storm-id) assignment-id 
> (Integer/parseInt port-str) worker-id)))
> If anything were to go wrong (CLASSPATH, jvm opts, misconfiguration...) 
> before -main or before mk-worker, then any output would be lost. The symptom 
> we saw was that the topology sat around apparently doing nothing, yet there 
> was no log indicating that the workers were failing to start.
> Is there other redirection to logs that I'm missing?
> --
> xiaokang: we use bash to launch worker process and redirect its stdout to 
> woker-port.out file. it heleped us find the zeromq jni problem that cause the 
> jvm crash without any log.
> --
> nathanmarz: @d2r Yea, that's all I was referring to. If we redirect stdout, 
> will the code that redirects stdout to the logging system still take effect? 
> This is important because we can control the size of the logfiles (via the 
> logback config) but not the size of the redirected stdout file.
> --
> d2r: My hunch is that it will work as it does now, except that any messages 
> that are getting thrown away before that point would go to a file instead. I 
> can play with it and find out. We wouldn't want to change the redirection, 
> just restore visibility to any output that might occur prior to the 
> redirection. There should be some safety valve to control the size of any new 
> .out in case something goes berserk.
> @xiaokang I see how that would work. We also need to make sure redirection 
> continues to work as it currently does for the above reason.
> --
> xiaokang: @d2r @nathanmarz In out cluster, storm's stdout redirection still 
> works for any System.out output while JNI errors goes to worker-port.out 
> file. I think it will be nice to use the same worker-port.log file for bash 
> stdout redirection since logback can control log file size. But it is a 
> little bit ugly to use bash to launch worker java process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1733) Logs from bin/storm are lost because stdout and stderr are not flushed

2016-04-27 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15261166#comment-15261166
 ] 

Erik Weathers commented on STORM-1733:
--

[gigantic auto-comment 
above|https://issues.apache.org/jira/browse/STORM-1733?focusedCommentId=15261156=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15261156]
 is an example of why I want to disable the automatic uploading of all GitHub 
stuff into JIRA.

> Logs from bin/storm are lost because stdout and stderr are not flushed
> --
>
> Key: STORM-1733
> URL: https://issues.apache.org/jira/browse/STORM-1733
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 0.9.3, 0.10.0, 0.9.4, 0.9.5, 0.9.6
>Reporter: Karthick Duraisamy Soundararaj
>Assignee: Karthick Duraisamy Soundararaj
>
> bin/storm.py emits the following crucial information that is lost because we 
> don't flush the stdout before exec.
> {code}
> 2016-04-25T08:23:43.17141 Running: java -server -Dstorm.options= 
> -Dstorm.home= -Xmx1024m -Dlogfile.name=nimbus.log 
> -Dlogback.configurationFile=logback/cluster.xml  backtype.storm.ui.core.nimbus
> {code}
> Observed Environment:
> {code}
> OS: CentOS release 6.5 
> Kernel: 2.6.32-431.el6.x86_64
> Python version: Python 2.7.2
> {code}
> For example, I using runit to start storm components like nimbus, ui, etc and 
> the problem is applicable to all the components and in all the cases, I am 
> not seeing logs that are emitted by bin/storm before {{os.execvp}} is called 
> to actually launch the component. 
> Please note that in cases where stdout and stderr is terminal, the stdout and 
> stderr are always flushed and the bug is not applicable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-143) Launching a process throws away standard out; can hang

2016-04-27 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15259806#comment-15259806
 ] 

Erik Weathers commented on STORM-143:
-

[~revans2] : seems this issue is fixed with the LogWriter that was introduced 
in storm-0.10.0.  I cannot find a ticket for that feature to link this against 
though.

> Launching a process throws away standard out; can hang
> --
>
> Key: STORM-143
> URL: https://issues.apache.org/jira/browse/STORM-143
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Reporter: James Xu
>Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/489
> https://github.com/nathanmarz/storm/blob/master/src/clj/backtype/storm/util.clj#L349
> When we launch a process, standard out is written to a system buffer and does 
> not appear to be read. Also, nothing is redirected to standard in. This can 
> have the following effects:
> A worker can hang when initializing (e.g. UnsatisfiedLinkError looking for 
> jzmq), and it will be unable to communicate the error as standard out is 
> being swallowed.
> A process that writes too much to standard out will block if the buffer fills
> A process that tries to read form standard in for any reason will block.
> Perhaps we can redirect standard out to an .out file, and redirect /dev/null 
> to the standard in stream of the process?
> --
> nathanmarz: Storm redirects stdout to the logging system. It's worked fine 
> for us in our topologies.
> --
> d2r: We see in worker.clj, in mk-worker, where there is a call to 
> redirect-stdio-to-slf4j!. This would not seem to help in cases such as we are 
> seeing when there is a problem launching the worker itself.
> (defn -main [storm-id assignment-id port-str worker-id]
>   (let [conf1 (read-storm-config)
> login_conf_file (System/getProperty "java.security.auth.login.config")
> conf (if login_conf_file (merge conf1 
> {"java.security.auth.login.config" login_conf_file}) conf1)]
> (validate-distributed-mode! conf)
> (mk-worker conf nil (java.net.URLDecoder/decode storm-id) assignment-id 
> (Integer/parseInt port-str) worker-id)))
> If anything were to go wrong (CLASSPATH, jvm opts, misconfiguration...) 
> before -main or before mk-worker, then any output would be lost. The symptom 
> we saw was that the topology sat around apparently doing nothing, yet there 
> was no log indicating that the workers were failing to start.
> Is there other redirection to logs that I'm missing?
> --
> xiaokang: we use bash to launch worker process and redirect its stdout to 
> woker-port.out file. it heleped us find the zeromq jni problem that cause the 
> jvm crash without any log.
> --
> nathanmarz: @d2r Yea, that's all I was referring to. If we redirect stdout, 
> will the code that redirects stdout to the logging system still take effect? 
> This is important because we can control the size of the logfiles (via the 
> logback config) but not the size of the redirected stdout file.
> --
> d2r: My hunch is that it will work as it does now, except that any messages 
> that are getting thrown away before that point would go to a file instead. I 
> can play with it and find out. We wouldn't want to change the redirection, 
> just restore visibility to any output that might occur prior to the 
> redirection. There should be some safety valve to control the size of any new 
> .out in case something goes berserk.
> @xiaokang I see how that would work. We also need to make sure redirection 
> continues to work as it currently does for the above reason.
> --
> xiaokang: @d2r @nathanmarz In out cluster, storm's stdout redirection still 
> works for any System.out output while JNI errors goes to worker-port.out 
> file. I think it will be nice to use the same worker-port.log file for bash 
> stdout redirection since logback can control log file size. But it is a 
> little bit ugly to use bash to launch worker java process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-954) Topology Event Inspector

2016-04-12 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-954:

Summary: Topology Event Inspector  (was: Toplogy Event Inspector)

> Topology Event Inspector
> 
>
> Key: STORM-954
> URL: https://issues.apache.org/jira/browse/STORM-954
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-core
>Reporter: Sriharsha Chintalapani
>Assignee: Arun Mahadevan
> Fix For: 1.0.0
>
>
> •Ability to view tuples flowing through the topology
> •Ability to turn on/off debug events without having to stop/restart topology
> •Default debug events is off
> •User should be able to select a specific Spout or Bolt and see incoming 
> events and outgoing events
> •We could put a configurable numbers of events to view (e.g. last 100 events 
> or last 1 minute)
> •Tuple stream to have following info
> •Message id, batch/transaction id, name/value pair, timestamp, acked (boolean)
> •All the above to be available from Storm UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1056) allow supervisor log filename to be configurable via ENV variable

2016-03-23 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209115#comment-15209115
 ] 

Erik Weathers commented on STORM-1056:
--

[~kabhwan]: ahh, seems that [the release notes for storm 
0.10.0|https://storm.apache.org/2015/11/05/storm0100-released.html] were just 
missing STORM-1056, but it's actually present in v0.10.0:
* https://github.com/apache/storm/blob/v0.10.0/bin/storm.py#L80

And in the binary release tarball:
{code}
(/tmp) % wget 
http://www.carfab.com/apachesoftware/storm/apache-storm-0.10.0/apache-storm-0.10.0.tar.gz
...
(/tmp) % tar -xf apache-storm-0.10.0.tar.gz
(/tmp/apache-storm-0.10.0) % grep SUPERVI bin/storm.py 
STORM_SUPERVISOR_LOG_FILE = os.getenv('STORM_SUPERVISOR_LOG_FILE', 
"supervisor.log")
"-Dlogfile.name=" + STORM_SUPERVISOR_LOG_FILE,
{code}

> allow supervisor log filename to be configurable via ENV variable
> -
>
> Key: STORM-1056
> URL: https://issues.apache.org/jira/browse/STORM-1056
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-core
>Reporter: Erik Weathers
>Assignee: Erik Weathers
>Priority: Minor
> Fix For: 0.9.6
>
>
> *Requested feature:*  allow configuring the supervisor's log filename when 
> launching it via an ENV variable.
> *Motivation:* The storm-on-mesos project (https://github.com/mesos/storm) 
> relies on multiple Storm Supervisor processes per worker host, where each 
> Supervisor is dedicated to a particular topology.  This is part of the 
> framework's functionality of separating topologies from each other.  i.e., 
> storm-on-mesos is a multi-tenant system.  But before the change requested in 
> this issue, the logs from all supervisors on a worker host will be written 
> into a supervisor log with a single name of supervisor.log.  If all logs are 
> written to a common location on the mesos host, then all logs go to the same 
> log file.  Instead it would be desirable to separate the supervisor logs 
> per-topology, so that each tenant/topology-owner can peruse the logs that are 
> related to their own topology.  Thus this ticket is requesting the ability to 
> configure the supervisor log via an environment variable whilst invoking 
> bin/storm.py (or bin/storm in pre-0.10 storm releases).
> When this ticket is fixed, we will include the topology ID into the 
> supervisor log filename for storm-on-mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-1631) storm CGroup bugs 1) when launching workers as the user that submitted the topology 2) when initial cleanup of cgroup fails

2016-03-22 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1631:
-
Summary: storm CGroup bugs 1) when launching workers as the user that 
submitted the topology 2) when initial cleanup of cgroup fails  (was: torm 
CGroup bugs 1) when launching workers as the user that submitted the topology 
2) when initial cleanup of cgroup fails)

> storm CGroup bugs 1) when launching workers as the user that submitted the 
> topology 2) when initial cleanup of cgroup fails
> ---
>
> Key: STORM-1631
> URL: https://issues.apache.org/jira/browse/STORM-1631
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>
> In secure multitenant storm, topology workers are launched with permission of 
> the user that submitted the topology. This causes a problem with cgroups 
> since workers are launched with permissions of the topology user which does 
> not have permissions to modify cgroups storm is using
> Also, the clean up code is not trying to clean up cgroups of killed workers 
> if the initial attempt failed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: zookeeper util methods

2016-01-31 Thread Erik Weathers
hey Anirudh,

To decipher this code we need to first realize that the the `zk` in that
code refers to CuratorFramework:

   -
   
https://github.com/apache/storm/blob/v0.10.0/storm-core/src/clj/backtype/storm/zookeeper.clj#L125

Next we need to figure out the funny (.. ) syntax.  I happen to have tribal
knowledge that this syntax is part of Clojure's java interop stuff, as
documented here:

http://clojure.org/reference/java_interop

*Clojure's docs on java interop:*

(.. instance-expr member+)
(.. Classname-symbol member+)

member ⇒ fieldName-symbol or (instanceMethodName-symbol args*)

Macro. Expands into a member access (.) of the first member on the first
argument, followed by the next member on the result, etc. For instance:

(.. System (getProperties) (get "os.name"))

expands to:

(. (. System (getProperties)) (get "os.name"))

but is easier to write, read, and understand. See also the -> macro which
can be used similarly:

(→ (System/getProperties) (.get "os.name"))


Hence:

   - (.. zk (getData) (forPath path))

Expands to:

   - (. (. zk (getData)) (forPath path))

Which is basically (in Java syntax):

   - zk.getData().forPath(path)

And the other line:

   -  (.. zk (getData) (watched) (forPath path))

Basically means:

   - zk.getData().watched().forPath(path)

Here are those docs for these Apache Curator methods:

   -
   
https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFramework.html#getData--
   -
   
https://curator.apache.org/apidocs/org/apache/curator/framework/api/Watchable.html#watched--
   -
   
https://curator.apache.org/apidocs/org/apache/curator/framework/api/Pathable.html#forPath-java.lang.String-

So when you call "watched" you are setting a "watcher" for the obtained
data.  As for what the "watcher" *is*, it seems that it comes from the
cluster.clj's wrapper code having created the CuratorFramework instance:

   -
   
https://github.com/apache/storm/blob/v0.10.0/storm-core/src/clj/backtype/storm/cluster.clj#L59-L70
   -
   
https://github.com/apache/storm/blob/v0.10.0/storm-core/src/clj/backtype/storm/zookeeper.clj#L48-L80
   -
   
https://github.com/apache/storm/blob/v0.10.0/storm-core/src/clj/backtype/storm/cluster.clj#L112-L114

Note that there are callbacks registered via that initialization code, so
the callbacks are invoked when a watch fires.

- Erik


On Sun, Jan 31, 2016 at 8:57 AM, Anirudh Jayakumar <
jayakumar.anir...@gmail.com> wrote:

> Hi,
>
> Could someone help me understand the difference between the below zk method
> invocations?
>
> a. (.. zk (getData) (watched) (forPath path))
> b. (.. zk (getData) (forPath path
>
> I want to understand the significance of "watched" method here.
>
> Thanks,
> Anirudh
>


Re: JStorm CGroup

2016-01-27 Thread Erik Weathers
Thanks for the detailed response Bobby.

Please include me in discussions about the pluggable interfaces when you
get to that step, as I can provide some insight from working on the
Storm-to-Mesos integration for awhile.  e.g., mesos client applications
("frameworks") don't "request external resources" like they do in YARN,
instead they wait for mesos to offer them resources.  So that behavior
difference has implications for the Nimbus scheduler (e.g., it shouldn't
assume all potential resources are present when getting the available
slots, as it might take some time for all available resources to percolate
from mesos).

That being said, as long as the "RAS" (resource aware scheduler) + cgroups
feature is allowing for dynamically partitioned per-topology-declared
resources across a cluster of hosts, then it sounds like a vast improvement
for multi-tenancy in native Storm.  (i.e., the "isolation scheduler" of
storm-0.8.2 and the "multi-tenant scheduler with resource limits" of
storm-0.10.0 are too static, isolating topologies at a host level instead
of allowing for individual topologies to declare the amount of CPU/memory
resources they need).

Thanks!

- Erik

On Thu, Jan 14, 2016 at 6:53 AM, Bobby Evans <ev...@yahoo-inc.com.invalid>
wrote:

> I would love to see true support for mesos, YARN, openstack, etc. added,
> but I also see stand alone mode offering a lot more flexibility, especially
> in the area of scheduling, than a two level scheduler can currently offer.
> It is on my roadmap to look into after the JStorm migration (just started),
> Resource Aware Scheduling (almost done needs testing and better isolation),
> and adding in automatic elasticity around topology specified SLAs (working
> with a few researchers around some prototypes in this area).
>
> To be able to support running on other cluster technologies in a proper
> way we need to provide plugability in a few different places.
> First we need a way for a scheduler/cluster to request topology specific
> dedicated resources, and for nimbus to provision, manage, monitor, and
> ideally resize (for elasticity) those resources.  With security and
> resource aware scheduling, we need these external requests to be on a per
> topology bases, not bolted on like they are now.  This would also
> necessitate the schedulers being updated so that they could take advantage
> of these new APIs requesting external resources either when a topology
> explicitly asks to be on a given external resource, or optionally when
> dedicated resources are no longer available and the topology has specified
> the proper configurations/credentials to allow it to run using those
> external resources.
>
> That handles scheduling, but there are some additional features that storm
> offers which other systems don't yet offer, and many never will.  For
> example the storm blob store API is similar to the dist cache in YARN, but
> it we can do in place replacement without relaunching.  We also favor fast
> fail and I don't think all of these types of clusters will nor should offer
> the process monitoring and re-spawning needed for it.  As such we would
> need some sort of a supervisor that would also run under YARN/mesos, etc to
> provide this extra functionality.  I have not totally thought about all of
> what it would need from a plugability standpoint to make that work.  There
> is also the logviewer which does more then just logs, so we would need some
> pluggable way to be able to point people to where their logs/artifacts are,
> and to monitor the resource usage of the logs (perhaps that part should
> move off to the supervisor). All of that seems like a lot more work
> compared to providing a pluggable interface in the supervisor that would
> allow for it to provision, manage, monitor, and again possibly resize,
> local workers.  In fact I see a lot of potential overlap between the two of
> them and the pluggability that would be needed in the supervisor for
> running on mesos, YARN, etc.
>
> - Bobby
>
> On Thursday, January 14, 2016 12:39 AM, Erik Weathers
> <eweath...@groupon.com.INVALID> wrote:
>
>
>  Perhaps rather than just bolting on "cgroup support", we could instead
> open
> a dialogue about having Mesos support be a core feature of Storm.
>
> The current integration is a bit unwieldy & hackish at the moment, arising
> from the conflicting natures of Mesos and Storm w.r.t. scheduling of
> resources.  i.e., Storm assumes you have existing "slots" for running
> workers on, whereas Mesos is more dynamic, requiring frameworks that run on
> top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
> needed by the framework's tasks.
>
> One example of an issue with Storm-on-Mesos:  the St

Re: DRPC server not working

2016-01-27 Thread Erik Weathers
Not sure I'll have time, but if I do then I'll try to get the DRPC starter
example working and then can probably tell you more explicitly how to make
your example work.

Can you please tell us which version of Storm you are using?

- Erik

On Wed, Jan 27, 2016 at 3:25 PM, researcher cs 
wrote:

> can you try this project ? i hope you can
>


Re: DRPC server not working

2016-01-27 Thread Erik Weathers
);
>>
>> fos.write(String.valueOf(System.currentTimeMillis()).getBytes());
>> byte[] newLine = "\n".getBytes();
>> int times = 0;
>> // emit tweets into topology
>> while ((tweetJson = br.readLine()) != null) {
>>
>> String result = drpc.execute(TOPOLOGY_NAME, tweetJson);
>>
>> Status s = null;
>> try {
>> s = DataObjectFactory.createStatus(tweetJson);
>> result = s.getId() + "\t" + s.getText() + "\t" +
>> result;
>> } catch (TwitterException e) {
>> LOG.error(e.toString());
>> }
>>
>> fos.write(result.getBytes());
>> fos.write(newLine);
>>
>> // times++;
>> // if (times == 1000)
>> // break;
>> }
>> fos.write(newLine);
>> fos.write("Finish: ".getBytes());
>>
>> fos.write(String.valueOf(System.currentTimeMillis()).getBytes());
>>
>> fos.flush();
>> fos.close();
>> br.close();
>>     drpc.shutdown();
>> cluster.shutdown();
>> } else {
>> // distributed mode
>> Config conf = createTopologyConfiguration(prop,true);
>> LocalDRPC drpc = null;
>> StormSubmitter.submitTopology(args[0], conf,
>> buildTopology(drpc));
>>
>> }
>>
>> }
>>
>> On Wed, Jan 27, 2016 at 11:14 PM, Erik Weathers <
>> eweath...@groupon.com.invalid> wrote:
>>
>>> Please put more effort into describing the issue.  "It doesn't work" is
>>> unfortunately not enough info for anyone to provide help.
>>> e.g., post links to some code you are trying to run, and the configs of
>>> the
>>> storm components that you are running.
>>>
>>> - Erik
>>>
>>> On Wed, Jan 27, 2016 at 4:59 AM, sam mohel <sammoh...@gmail.com> wrote:
>>>
>>> > I wrote the actual problem in My first message drpc server not working
>>> I
>>> > hot zeros in the columns of storm ui like emitted and transferred ,
>>> result
>>> > file is empty
>>> >
>>> > On Wednesday, January 27, 2016, Erik Weathers
>>> > <eweath...@groupon.com.invalid>
>>> > wrote:
>>> >
>>> > > Your mail client is wrapping the log lines prematurely, I have a
>>> really
>>> > > really hard time reading wrapped lines, I'd look into fixing that if
>>> I
>>> > were
>>> > > you.  Here they are unwrapped:
>>> > >
>>> > > 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>> > > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client
>>> connection,
>>> > > connectString=localhost:2181 sessionTimeout=2
>>> > > watcher=com.netflix.curator.ConnectionState@2fa423d2
>>> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket
>>> connection to
>>> > > server localhost/127.0.1.1:2181. Will not attempt to authenticate
>>> using
>>> > > SASL (unknown error)
>>> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection
>>> established
>>> > > to localhost/127.0.1.1:2181, initiating session
>>> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment
>>> > complete
>>> > > on server localhost/127.0.1.1:2181, sessionid = 0x152804f3a3a0002,
>>> > > negotiated timeout = 2
>>> > > 2016-01-27 01:41:00 b.s.zookeeper [INFO] Zookeeper state update:
>>> > > :connected:none
>>> > > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Session: 0x152804f3a3a0002
>>> > > closed
>>> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] EventThread shut down
>>> > > 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
>>> > > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client
>>> connection,
>>> > > connectString=localhost:2181/storm sessionTimeout=2
>>> > > watcher=com.netflix.curator.ConnectionState@2ee8b0bf
>>> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket
>>> connection to
>>> > > server use

Re: DRPC server not working

2016-01-27 Thread Erik Weathers
I really don't think you can have any expectation of this doing anything
without somehow shoving work into it.  Look at all the things being done in
the "local cluster" block versus the "remote" block of your code.

Would be nice if someone who has ever used DRPC could respond.  The docs
seem to be lacking in this critical area of getting it to work with a
remote cluster.  i.e., the 3 steps definitely don't seem to be sufficient.

- Erik

On Wed, Jan 27, 2016 at 3:16 PM, researcher cs <prog.researc...@gmail.com>
wrote:

> thanks but Agian i imported this project from github that it's supposed
> work well , really i'm on this problem for months without fixing it !! i
> sent coder message but he didn't reply with this problem . don't know who
> can i ask but thanks for your time
>
> On Thu, Jan 28, 2016 at 1:12 AM, Erik Weathers <
> eweath...@groupon.com.invalid> wrote:
>
> > hey Sam,
> >
> > Again, I've never used the DRPC feature.  But it naively looks like you
> > aren't using it correctly.
> > Note that the working case for you is with a LocalCluster & LocalDRPC,
> and
> > you are explicitly invoking the execute():
> > String result = drpc.execute(TOPOLOGY_NAME, tweetJson);
> >
> > In the remote case you aren't doing anything except submitting the
> > topology.  I assume you need to invoke drpc.execute *somewhere*... Here's
> > what I think is the same question as you are posing:
> >
> >- http://stackoverflow.com/a/26440260/318428
> >
> > - Erik
> >
> > On Wed, Jan 27, 2016 at 3:04 PM, researcher cs <
> prog.researc...@gmail.com>
> > wrote:
> >
> > > also got zeros after open storm ui except executed column
> > >
> > > ​
> > > ​
> > >
> > > On Thu, Jan 28, 2016 at 12:43 AM, researcher cs <
> > prog.researc...@gmail.com
> > > > wrote:
> > >
> > >> i atteached what i got when i submitted topology and this happened
> also
> > >> when i submitted
> storm-starter-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> > >> storm.starter.BasicDRPCTopology basic-drpc
> > >>
> > >> this storm.yaml
> > >>  storm.zookeeper.servers:
> > >>  - "localhost"
> > >> #- "server2"
> > >>  nimbus.host: "localhost"
> > >>
> > >>  storm.local.dir: "/var/storm"
> > >>  supervisor.slots.ports:
> > >>  - 7660
> > >>  - 7659
> > >>  - 7658
> > >>  - 7657
> > >>  supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
> > >>  nimbus.childopts: "-Djava.net.preferIPv4Stack=true"
> > >>  worker.childopts: "-Djava.net.preferIPv4Stack=true"
> > >> # topology.message.timeout.secs: 30
> > >> # topology.workers: 1
> > >>  topology.stats.sample.rate: 1.0
> > >>  topology.acker.executors: 1
> > >> # topology.executor.receive.buffer.size: 16384
> > >> # topology.executor.send.buffer.size: 16384
> > >> # topology.transfer.buffer.size: 32
> > >> # topology.receiver.buffer.size: 8
> > >> #
> > >> # # These may optionally be filled in:
> > >> #
> > >> ## List of custom serializations
> > >> # topology.kryo.register:
> > >> # - org.mycompany.MyType
> > >> # - org.mycompany.MyType2: org.mycompany.MyType2Serializer
> > >> #
> > >> ## List of custom kryo decorators
> > >> # topology.kryo.decorators:
> > >> # - org.mycompany.MyDecorator
> > >> #
> > >> ## Locations of the drpc servers
> > >> # drpc.servers:
> > >>   #  - "localhost"
> > >> # - "server2"
> > >>
> > >> ## Metrics Consumers
> > >> # topology.metrics.consumer.register:
> > >> #   - class: "backtype.storm.metrics.LoggingMetricsConsumer"
> > >> # parallelism.hint: 1
> > >> #   - class: "org.mycompany.MyMetricsConsumer"
> > >> # parallelism.hint: 1
> > >> # argument:
> > >> #   - endpoint: "metrics-collector.mycompany.org"
> > >>  storm.messaging.transport: "backtype.storm.messaging.netty.Context"
> > >>  storm.messaging.netty.server_worker_threads: 1
> > >>  storm.messaging.netty.client_worker_threads: 1
> > >>  storm.messaging.netty.buffer_size: 5242

Re: DRPC server not working

2016-01-27 Thread Erik Weathers
Please put more effort into describing the issue.  "It doesn't work" is
unfortunately not enough info for anyone to provide help.
e.g., post links to some code you are trying to run, and the configs of the
storm components that you are running.

- Erik

On Wed, Jan 27, 2016 at 4:59 AM, sam mohel <sammoh...@gmail.com> wrote:

> I wrote the actual problem in My first message drpc server not working I
> hot zeros in the columns of storm ui like emitted and transferred , result
> file is empty
>
> On Wednesday, January 27, 2016, Erik Weathers
> <eweath...@groupon.com.invalid>
> wrote:
>
> > Your mail client is wrapping the log lines prematurely, I have a really
> > really hard time reading wrapped lines, I'd look into fixing that if I
> were
> > you.  Here they are unwrapped:
> >
> > 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client connection,
> > connectString=localhost:2181 sessionTimeout=2
> > watcher=com.netflix.curator.ConnectionState@2fa423d2
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection to
> > server localhost/127.0.1.1:2181. Will not attempt to authenticate using
> > SASL (unknown error)
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection established
> > to localhost/127.0.1.1:2181, initiating session
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment
> complete
> > on server localhost/127.0.1.1:2181, sessionid = 0x152804f3a3a0002,
> > negotiated timeout = 2
> > 2016-01-27 01:41:00 b.s.zookeeper [INFO] Zookeeper state update:
> > :connected:none
> > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Session: 0x152804f3a3a0002
> > closed
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] EventThread shut down
> > 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client connection,
> > connectString=localhost:2181/storm sessionTimeout=2
> > watcher=com.netflix.curator.ConnectionState@2ee8b0bf
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection to
> > server user-Lenovo-G50-70/127.0.0.1:2181. Will not attempt to
> authenticate
> > using SASL (unknown error)
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection established
> > to user-Lenovo-G50-70/127.0.0.1:2181, initiating session
> > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment
> complete
> > on server user-Lenovo-G50-70/127.0.0.1:2181, sessionid =
> > 0x152804f3a3a0003,
> > negotiated timeout = 2
> >
> > None of those indicate a problem, they look pretty standard to me.
> >
> > Please spend a bit more time zeroing in on what the actual problem is so
> > that the members of the list(s) can provide help.
> >
> > - Erik
> >
> > On Tue, Jan 26, 2016 at 10:15 PM, researcher cs <
> prog.researc...@gmail.com
> > <javascript:;>>
> > wrote:
> >
> > > yes i tried in local and worked well
> > > and about /etc/hosts . i'm feeling that this file has a mistake , i
> made
> > > alot of changes in this file and didn't remember what was default
> > >
> > > In nimbus log file when it connected zookeeper i got
> > > 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> > > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client
> connection,
> > > connectString=localhost:2181 sessionTimeout=2
> > > watcher=com.netflix.curator.ConnectionState@2fa423d2
> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection
> to
> > > server localhost/127.0.1.1:2181. Will not attempt to authenticate
> using
> > > SASL (unknown error)
> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection
> established
> > > to localhost/127.0.1.1:2181, initiating session
> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment
> > complete
> > > on server localhost/127.0.1.1:2181, sessionid = 0x152804f3a3a0002,
> > > negotiated timeout = 2
> > > 2016-01-27 01:41:00 b.s.zookeeper [INFO] Zookeeper state update:
> > > :connected:none
> > > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Session: 0x152804f3a3a0002
> > > closed
> > > 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] EventThread shut down
> > > 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> > > 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client
> connection,
> > > connec

Re: DRPC server not working

2016-01-26 Thread Erik Weathers
Your mail client is wrapping the log lines prematurely, I have a really
really hard time reading wrapped lines, I'd look into fixing that if I were
you.  Here they are unwrapped:

2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=localhost:2181 sessionTimeout=2
watcher=com.netflix.curator.ConnectionState@2fa423d2
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection to
server localhost/127.0.1.1:2181. Will not attempt to authenticate using
SASL (unknown error)
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection established
to localhost/127.0.1.1:2181, initiating session
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment complete
on server localhost/127.0.1.1:2181, sessionid = 0x152804f3a3a0002,
negotiated timeout = 2
2016-01-27 01:41:00 b.s.zookeeper [INFO] Zookeeper state update:
:connected:none
2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Session: 0x152804f3a3a0002 closed
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] EventThread shut down
2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client connection,
connectString=localhost:2181/storm sessionTimeout=2
watcher=com.netflix.curator.ConnectionState@2ee8b0bf
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection to
server user-Lenovo-G50-70/127.0.0.1:2181. Will not attempt to authenticate
using SASL (unknown error)
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection established
to user-Lenovo-G50-70/127.0.0.1:2181, initiating session
2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment complete
on server user-Lenovo-G50-70/127.0.0.1:2181, sessionid = 0x152804f3a3a0003,
negotiated timeout = 2

None of those indicate a problem, they look pretty standard to me.

Please spend a bit more time zeroing in on what the actual problem is so
that the members of the list(s) can provide help.

- Erik

On Tue, Jan 26, 2016 at 10:15 PM, researcher cs <prog.researc...@gmail.com>
wrote:

> yes i tried in local and worked well
> and about /etc/hosts . i'm feeling that this file has a mistake , i made
> alot of changes in this file and didn't remember what was default
>
> In nimbus log file when it connected zookeeper i got
> 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=localhost:2181 sessionTimeout=2
> watcher=com.netflix.curator.ConnectionState@2fa423d2
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server localhost/127.0.1.1:2181. Will not attempt to authenticate using
> SASL (unknown error)
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection established
> to localhost/127.0.1.1:2181, initiating session
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server localhost/127.0.1.1:2181, sessionid = 0x152804f3a3a0002,
> negotiated timeout = 2
> 2016-01-27 01:41:00 b.s.zookeeper [INFO] Zookeeper state update:
> :connected:none
> 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Session: 0x152804f3a3a0002
> closed
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] EventThread shut down
> 2016-01-27 01:41:00 c.n.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2016-01-27 01:41:00 o.a.z.ZooKeeper [INFO] Initiating client connection,
> connectString=localhost:2181/storm sessionTimeout=2
> watcher=com.netflix.curator.ConnectionState@2ee8b0bf
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Opening socket connection to
> server user-Lenovo-G50-70/127.0.0.1:2181. Will not attempt to authenticate
> using SASL (unknown error)
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Socket connection established
> to user-Lenovo-G50-70/127.0.0.1:2181, initiating session
> 2016-01-27 01:41:00 o.a.z.ClientCnxn [INFO] Session establishment complete
> on server user-Lenovo-G50-70/127.0.0.1:2181, sessionid =
> 0x152804f3a3a0003,
> negotiated timeout = 2
>
> and i set for drpc.server : localhost
> storm.zookeeper.server : localhost
> nimbus.host : localhost
>
> in my /etc/hosts
> 127.0.0.1  user-Lenovo-G50-70  localhost
> 127.0.1.1  localhost
>
>  is that right ?
>
>
> On Wed, Jan 27, 2016 at 6:12 AM, Erik Weathers <
> eweath...@groupon.com.invalid> wrote:
>
> > You said: "except the statement of drpc server trying to connect"
> >
> > Maybe you are confused about what "b.s.d.drpc [INFO] Starting Distributed
> > RPC servers..." implies?
> > That is just saying that the server is being started.   It's a server,
> not
> > a client, so it's basic operation is *not* to connect to some other
> thing.
> > It's up and waiting 

Re: DRPC server not working

2016-01-26 Thread Erik Weathers
You said: "except the statement of drpc server trying to connect"

Maybe you are confused about what "b.s.d.drpc [INFO] Starting Distributed
RPC servers..." implies?
That is just saying that the server is being started.   It's a server, not
a client, so it's basic operation is *not* to connect to some other thing.
It's up and waiting for you to tell it to do stuff.

Have you gotten the Local-mode version of DRPC working?

   -
   https://storm.apache.org/documentation/Distributed-RPC.html#local-mode-drpc

Not sure what you're asking with regards to DNS and /etc/hosts, those seem
unrelated to your basic issue.

- Erik

On Tue, Jan 26, 2016 at 6:58 PM, researcher cs <prog.researc...@gmail.com>
wrote:

> There is no error except the statement of drpc server trying to connect . I
> guess I have problem with Dns . if you have any idea about this pleaes help
>
>
> I want to submit topology with single machine
> Only on my laptop without any other devices as a first step so
>  What this file /etc/hosts should contains ?
> As I set in drpc.server : localhost
> Storm.zookeeper. server: localhost
> Nimbus.host:localhost
>
> As this file contains 127.0.1.1 and 127.0.0.1 and my IP address ?
>
> What should I use and what should I hash it to not using it ?
>
>
>
> On Wednesday, January 27, 2016, Erik Weathers
> <eweath...@groupon.com.invalid>
> wrote:
>
> > What does the client code that is supposed to make the DRPC connection
> > telling you?  i.e., you should see some exception or log about not
> > establishing the connection, right?
> >
> > Alternatively, perhaps the connections aren't persistent and there's no
> > actual problem?
> >
> > - Erik
> >
> > On Tue, Jan 26, 2016 at 4:55 PM, researcher cs <
> prog.researc...@gmail.com
> > <javascript:;>>
> > wrote:
> >
> > >  thanks for replying , i read the documentation before , i imported
> > project
> > > supposed to work well but not working with me
> > > i checked port by lsof -i gave me all ports i connected it for storm
> > > java  10675   root   20u  IPv4  98126  0t0  TCP *:52022
> (LISTEN)
> > > java  10675   root   26u  IPv4  98131  0t0  TCP *:2181 (LISTEN)
> > > java  10675   root   27u  IPv4 101944  0t0  TCP
> > > localhost:2181->user-Lenovo-G50-70:38150 (ESTABLISHED)
> > > java  10675   root   29u  IPv4  98974  0t0  TCP
> > > user-Lenovo-G50-70:2181->user-Lenovo-G50-70:50526 (ESTABLISHED)
> > > java  10675   root   30u  IPv4  99105  0t0  TCP
> > > localhost:2181->user-Lenovo-G50-70:38165 (ESTABLISHED)
> > > java  10715   root   90u  IPv4  98953  0t0  TCP
> > > user-Lenovo-G50-70:38150->localhost:2181 (ESTABLISHED)
> > > java  10715   root   91u  IPv4  98245  0t0  TCP *:6627 (LISTEN)
> > > java  10792   root   90u  IPv4  99973  0t0  TCP
> > > user-Lenovo-G50-70:50526->user-Lenovo-G50-70:2181 (ESTABLISHED)
> > > java  10864   root   82u  IPv4 102425  0t0  TCP *:3772 (LISTEN)
> > > java  10864   root   84u  IPv4 102429  0t0  TCP *:3773 (LISTEN)
> > > java  10864   root   92u  IPv4 102197  0t0  TCP
> > > user-Lenovo-G50-70:3773->user-Lenovo-G50-70:50825 (ESTABLISHED)
> > > java  10928   root   81u  IPv4 102070  0t0  TCP *:http-alt
> > (LISTEN)
> > > java  11087   root   81u  IPv4 100091  0t0  TCP
> > > user-Lenovo-G50-70:50825->user-Lenovo-G50-70:3773 (ESTABLISHED)
> > > java  11087   root   91u  IPv4 102196  0t0  TCP
> > > user-Lenovo-G50-70:38165->localhost:2181 (ESTABLISHED)
> > > java  11087   root   94u  IPv4 102561  0t0  TCP *:7660 (LISTEN)
> > >
> > >
> > > here you can see that 3772 not established
> > >
> > > On Wed, Jan 27, 2016 at 2:47 AM, Erik Weathers <
> > > eweath...@groupon.com.invalid> wrote:
> > >
> > > > hey,
> > > >
> > > > The DRPC server is up and listening on port 3772.   Why do you expect
> > > > established connections?
> > > >
> > > > I'm not familiar with using Storm's DRPC feature, but I'm sure you
> need
> > > to
> > > > write code that interacts with the DRPC server, and you've made no
> > > mention
> > > > of doing so in your email.  I'd start here:
> > > >
> > > >- https://storm.apache.org/documentation/Distributed-RPC.html
> > > >
> > > > - Erik
> > > >
> > > > On Tue, Jan 26, 2016 at 4:29 PM,

[jira] [Commented] (STORM-1342) support multiple logviewers per host for container-isolated worker logs

2016-01-26 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118650#comment-15118650
 ] 

Erik Weathers commented on STORM-1342:
--

STORM-1494 is adding support for the supervisor logs to be linked from the 
Nimbus UI.  So this will likely be another area to adjust when (if!?) this is 
fixed.

> support multiple logviewers per host for container-isolated worker logs
> ---
>
> Key: STORM-1342
> URL: https://issues.apache.org/jira/browse/STORM-1342
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-core
>Reporter: Erik Weathers
>Priority: Minor
>
> h3. Storm-on-Mesos Worker Logs are in varying directories
> When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each 
> topology's workers are isolated into separate containers.  By default the 
> worker logs will be saved into container-specific sandbox directories.  These 
> directories are also topology-specific by definition, because, as just 
> stated, the containers are specific to each topology.
> h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
> A challenge with this different way of running Storm is that the [Storm 
> logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui]
>  runs as a single instance on each worker host.   This doesn't play well with 
> having the topology worker logs in separate per-topology containers.  The one 
> logviewer doesn't know about the various sandbox directories that the Storm 
> Workers are writing to.  And if we just spawned new logviewers for each 
> container, the problem is that the Storm UI only knows about 1 global port 
> the logviewer, so you cannot just direct.
> These problems are documented (or linked to) from [Issue #6 in the 
> storm-on-mesos project|https://github.com/mesos/storm/issues/6]
> h3. Possible Solutions I can envision
> # configure the Storm workers to write to log directories that exist on the 
> raw host outside of the container sandbox, and run a single logviewer on a 
> host, which serves up the contents of that directory.
> #* violates one of the basic reasons for using containers: isolation.
> #* also prevents allow a standard use case for Mesos: running more than 1 
> instance of a Mesos Framework (e.g., "Storm Cluster") at once on same Mesos 
> Cluster. e.g., for Blue-Green deployments.
> #* a variation on this proposal is to somehow expose the sandbox dirs of all 
> storm containers to this singleton logviewer process (still has above 
> problems)
> # launch a separate logviewers in each container, and somehow register those 
> logviewers with Storm such that Storm knows for a given host which logviewer 
> port is assigned to a given topology.
> #* this is the proposed solution
> h3. Storm Changes for the Proposed Solution
> Nimbus or ZooKeeper could serve as a registrar, recording the association 
> between a slot (host + worker port) and the logviewer port that is serving 
> the workers logs. And the Storm-on-Mesos framework could update this registry 
> when launching a new worker.  (This proposal definitely calls for thorough 
> vetting and thinking.)
> h3. Storm-on-Mesos Framework Changes for the Proposed Solution
> Along with the interaction with the "registrar" proposed above, the 
> storm-on-mesos framework can be enhanced to launch multiple logviewers on a 
> given worker host, where each logviewer is dedicated to serving the worker 
> logs from a specific topology's container/sandbox directory.  This would be 
> done by launching a logviewer process within the topology's container, and 
> assigning it an arbitrary listening port that has been determined dynamically 
> through mesos (which treats ports as one of the schedulable resource 
> primitives of a worker host).  [Code implementing this 
> logviewer-port-allocation logic already 
> exists|https://github.com/mesos/storm/commit/af8c49beac04b530c33c1401c829caaa8e368a35],
>  but [that specific portion of the code was 
> reverted|https://github.com/mesos/storm/commit/dc3eee0f0e9c06f6da7b2fe697a8e4fc05b5227e]
>  because of the issues that inspired this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: DRPC server not working

2016-01-26 Thread Erik Weathers
hey,

The DRPC server is up and listening on port 3772.   Why do you expect
established connections?

I'm not familiar with using Storm's DRPC feature, but I'm sure you need to
write code that interacts with the DRPC server, and you've made no mention
of doing so in your email.  I'd start here:

   - https://storm.apache.org/documentation/Distributed-RPC.html

- Erik

On Tue, Jan 26, 2016 at 4:29 PM, researcher cs 
wrote:

> I set in the code
> conf.put(Config.DRPC_SERVERS, dprcServers);
> conf.put(Config.DRPC_PORT, 3772);
> but when i submit topolgoy i found at the end of the file
>  b.s.d.drpc [INFO] Starting Distributed RPC servers...
>
> i checked port 3772 by
>
> sudo netstat -ap | grep 3772
>
> i got
>
> tcp 00 *:3772*:* LISTEN  10864/java
> unix  3  [ ] STREAM CONNECTED 13772
> 587/dbus-daemon /var/run/dbus/system_bus_socket
>
>
> why it's not established  ?
>
> can i find help ?
>


Re: DRPC server not working

2016-01-26 Thread Erik Weathers
What does the client code that is supposed to make the DRPC connection
telling you?  i.e., you should see some exception or log about not
establishing the connection, right?

Alternatively, perhaps the connections aren't persistent and there's no
actual problem?

- Erik

On Tue, Jan 26, 2016 at 4:55 PM, researcher cs <prog.researc...@gmail.com>
wrote:

>  thanks for replying , i read the documentation before , i imported project
> supposed to work well but not working with me
> i checked port by lsof -i gave me all ports i connected it for storm
> java  10675   root   20u  IPv4  98126  0t0  TCP *:52022 (LISTEN)
> java  10675   root   26u  IPv4  98131  0t0  TCP *:2181 (LISTEN)
> java  10675   root   27u  IPv4 101944  0t0  TCP
> localhost:2181->user-Lenovo-G50-70:38150 (ESTABLISHED)
> java  10675   root   29u  IPv4  98974  0t0  TCP
> user-Lenovo-G50-70:2181->user-Lenovo-G50-70:50526 (ESTABLISHED)
> java  10675   root   30u  IPv4  99105  0t0  TCP
> localhost:2181->user-Lenovo-G50-70:38165 (ESTABLISHED)
> java  10715   root   90u  IPv4  98953  0t0  TCP
> user-Lenovo-G50-70:38150->localhost:2181 (ESTABLISHED)
> java  10715   root   91u  IPv4  98245  0t0  TCP *:6627 (LISTEN)
> java  10792   root   90u  IPv4  99973  0t0  TCP
> user-Lenovo-G50-70:50526->user-Lenovo-G50-70:2181 (ESTABLISHED)
> java  10864   root   82u  IPv4 102425  0t0  TCP *:3772 (LISTEN)
> java  10864   root   84u  IPv4 102429  0t0  TCP *:3773 (LISTEN)
> java  10864   root   92u  IPv4 102197  0t0  TCP
> user-Lenovo-G50-70:3773->user-Lenovo-G50-70:50825 (ESTABLISHED)
> java  10928   root   81u  IPv4 102070  0t0  TCP *:http-alt (LISTEN)
> java  11087   root   81u  IPv4 100091  0t0  TCP
> user-Lenovo-G50-70:50825->user-Lenovo-G50-70:3773 (ESTABLISHED)
> java  11087   root   91u  IPv4 102196  0t0  TCP
> user-Lenovo-G50-70:38165->localhost:2181 (ESTABLISHED)
> java  11087   root   94u  IPv4 102561  0t0  TCP *:7660 (LISTEN)
>
>
> here you can see that 3772 not established
>
> On Wed, Jan 27, 2016 at 2:47 AM, Erik Weathers <
> eweath...@groupon.com.invalid> wrote:
>
> > hey,
> >
> > The DRPC server is up and listening on port 3772.   Why do you expect
> > established connections?
> >
> > I'm not familiar with using Storm's DRPC feature, but I'm sure you need
> to
> > write code that interacts with the DRPC server, and you've made no
> mention
> > of doing so in your email.  I'd start here:
> >
> >- https://storm.apache.org/documentation/Distributed-RPC.html
> >
> > - Erik
> >
> > On Tue, Jan 26, 2016 at 4:29 PM, researcher cs <
> prog.researc...@gmail.com>
> > wrote:
> >
> > > I set in the code
> > > conf.put(Config.DRPC_SERVERS, dprcServers);
> > > conf.put(Config.DRPC_PORT, 3772);
> > > but when i submit topolgoy i found at the end of the file
> > >  b.s.d.drpc [INFO] Starting Distributed RPC servers...
> > >
> > > i checked port 3772 by
> > >
> > > sudo netstat -ap | grep 3772
> > >
> > > i got
> > >
> > > tcp 00 *:3772*:* LISTEN  10864/java
> > > unix  3  [ ] STREAM CONNECTED 13772
> > > 587/dbus-daemon /var/run/dbus/system_bus_socket
> > >
> > >
> > > why it's not established  ?
> > >
> > > can i find help ?
> > >
> >
>


[jira] [Commented] (STORM-1141) Maven Central does not have 0.10.0 libraries

2016-01-21 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15111524#comment-15111524
 ] 

Erik Weathers commented on STORM-1141:
--

[~cburch]:  0.10.x and 0.10.0 don't have {{ClusterSummary.nimbuses}}:
* 
https://github.com/apache/storm/blob/v0.10.0/storm-core/src/jvm/backtype/storm/generated/ClusterSummary.java#L68-L70
* 
https://github.com/apache/storm/blob/0.10.x-branch/storm-core/src/jvm/backtype/storm/generated/ClusterSummary.java#L68-L70

That field [landed into 
master|https://github.com/apache/storm/commit/4502bffbe3f9b4cd3674a56afbda1bb115cec239]
 and wasn't put into 0.10.0.  I believe it's part of the HA Nimbus support that 
is in 0.11.x.

> Maven Central does not have 0.10.0 libraries
> 
>
> Key: STORM-1141
> URL: https://issues.apache.org/jira/browse/STORM-1141
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: caleb burch
>Assignee: P. Taylor Goetz
>Priority: Blocker
> Fix For: 0.10.0
>
>
> HDP has moved to 2.3 that features Storm 0.10.0.  The current storm-core jars 
> on maven central are back at 0.9.5 and the beta 0.10.0 drivers aren't up to 
> date.  (They lack the list of nimbus nodes so fail with a 
> "nimbus.uptime.secs" not set error when attempting to get ClusterInfo via the 
> java client).
> Any chance the latest 0.10.x build can be pushed to maven, or a timeframe of 
> when you expect to do it?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-822) As a storm developer I’d like to use the new kafka consumer API (0.8.3) to reduce dependencies and use long term supported kafka apis

2016-01-19 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15107726#comment-15107726
 ] 

Erik Weathers commented on STORM-822:
-

[~DeepNekro]:  can you please comment on whether your work directly overlaps 
with STORM-1015?

> As a storm developer I’d like to use the new kafka consumer API (0.8.3) to 
> reduce dependencies and use long term supported kafka apis 
> --
>
> Key: STORM-822
> URL: https://issues.apache.org/jira/browse/STORM-822
> Project: Apache Storm
>  Issue Type: Story
>  Components: storm-kafka
>Reporter: Thomas Becker
>Assignee: Hugo Louro
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: JStorm CGroup

2016-01-13 Thread Erik Weathers
Perhaps rather than just bolting on "cgroup support", we could instead open
a dialogue about having Mesos support be a core feature of Storm.

The current integration is a bit unwieldy & hackish at the moment, arising
from the conflicting natures of Mesos and Storm w.r.t. scheduling of
resources.  i.e., Storm assumes you have existing "slots" for running
workers on, whereas Mesos is more dynamic, requiring frameworks that run on
top of it to tell Mesos just how many resources (CPUs, Memory, etc.) are
needed by the framework's tasks.

One example of an issue with Storm-on-Mesos:  the Storm logviewer is
completely busted when you are using Mesos, I filed a ticket with a
description of the issue and proposed modifications to allow it to function:

   - https://issues.apache.org/jira/browse/STORM-1342

Furthermore, there are fundamental behaviors in Storm that don't mesh well
with Mesos:

   - the interfaces of INimbus (allSlotsAvailableForScheduling(),
   assignSlots(), getForcedScheduler(), etc.) make it difficult to create an
   ideal Mesos integration framework, since they don't allow the Mesos
   integration code to *really* know what's going on from the Nimbus's
   perspective. e.g.,
  - knowing which topologies & how many workers need to be scheduled at
  any given moment.
  - since the integration code cannot know what is actually needed to
  be run when it receives offers from Mesos, it just hoards those offers,
  leading to resource starvation in the Mesos cluster.
   - the "fallback" behavior of allowing the topology to settle for having
   less worker processes than requested should be disable-able.  For carefully
   tuned topologies it is quite bad to run on less than the expected number of
   worker processes.
  - also, this behavior endangers the idea of having the Mesos
  integration code *only* hoard Mesos offers after a successful round-trip
  through the allSlotsAvailableForScheduling() polling calls (i.e., only
  hoard when we know there are pending topologies).  It's dangerous because
  while we wait for another call to allSlotsAvailableForScheduling(), the
  Nimbus may have decided that it's okie dokie to use less than
the requested
  number of worker processes.

I'm sure there are other issues that I can conjure up, but those are the
major ones that came to mind instantly.  I'm happy to explain more about
this, since I realize the above bulleted info may lack context.

I wish I knew something about how Twitter's new Heron project addresses the
concerns above since it comes with Mesos support out-of-the-box, but it's
unclear at this point what they're doing until they open source it.

Thanks!

- Erik

On Wed, Jan 13, 2016 at 6:27 PM, 刘键(Basti Liu) 
wrote:

> Hi Bobby & Jerry,
>
> Yes, JStorm implements generic cgroup support. But just only cpu control
> is enable when starting worker.
>
> Regards
> Basti
>
> -Original Message-
> From: Bobby Evans [mailto:ev...@yahoo-inc.com.INVALID]
> Sent: Wednesday, January 13, 2016 11:14 PM
> To: dev@storm.apache.org
> Subject: Re: JStorm CGroup
>
> Jerry,
> I think most of the code you are going to want to look at is here
> https://github.com/apache/storm/blob/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/daemon/supervisor/CgroupManager.java
> The back end for most of it seems to come from
>
>
> https://github.com/apache/storm/tree/jstorm-import/jstorm-core/src/main/java/com/alibaba/jstorm/container
>
> Which looks like it implements a somewhat generic cgroup support.
>  - Bobby
>
> On Wednesday, January 13, 2016 1:34 AM, 刘键(Basti Liu) <
> basti...@alibaba-inc.com> wrote:
>
>
>  Hi Jerry,
>
> Currently, JStorm supports to control the upper limit of cpu time for a
> worker by cpu.cfs_period_us & cpu.cfs_quota_us in cgroup.
> e.g. cpu.cfs_period_us= 10, cpu.cfs_quota_us=3*10. Cgroup will
> limit the corresponding process to occupy at most 300% cpu (3 cores).
>
> Regards
> Basti
>
> -Original Message-
> From: Jerry Peng [mailto:jerry.boyang.p...@gmail.com]
> Sent: Wednesday, January 13, 2016 1:57 PM
> To: dev@storm.apache.org
> Subject: JStorm CGroup
>
> Hello everyone,
>
> This question is directed more towards the people that worked on JStorm.
> If I recall correctly JStorm offers some sort of resource isolation through
> CGroups.  What kind of support does JStorm offer for resource isolation?
> Can someone elaborate on this feature in JStorm.
>
> Best,
>
> Jerry
>
>
>
>
>


Re: HDFS Bolts -- partitioning output

2016-01-11 Thread Erik Weathers
Awesome Aaron, I can send you what we have done offline!

- Erik

On Thu, Jan 7, 2016 at 11:12 AM, Aaron.Dossett <aaron.doss...@target.com>
wrote:

> Thanks, Erik.  Your “Partitioner” is exactly what I had in mind and even
> what I named my stubbed out interface :-)  Since Target has decided against
> this approach for other reasons, it will have to be a side project for me
> for now.
>
> Best, Aaron
>
> From: Erik Weathers <eweath...@groupon.com>
> Reply-To: "u...@storm.apache.org" <u...@storm.apache.org>
> Date: Wednesday, January 6, 2016 at 5:48 PM
> To: "u...@storm.apache.org" <u...@storm.apache.org>
> Cc: "dev@storm.apache.org" <dev@storm.apache.org>
> Subject: Re: HDFS Bolts -- partitioning output
>
> hey Aaron,
>
> We've also written a similar bolt at Groupon, we aren't super satisfied
> with the implementation though. :)  We are begrudgingly using it because
> there is no partitioning support in the OSS storm-hdfs bolt.
>
> Though one thing I do like about our implementation is having the ability
> to define your own "Partitioner" in each topology to do various types of
> partitioning (date-based, message ID-based, topic-based, whatever).  It
> would be great if your implementation had such logic too.  e.g., when
> deciding the HDFS path for a tuple's data, the Partitioner is called to
> determine the HDFS path.  For example, it can take the Tuple object and an
> opaque key/value Configuration hash that can pass items like a kafka topic
> name to be included into the HDFS path.
>
> - Erik
>
> On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett <aaron.doss...@target.com>
> wrote:
>
>> Hi,
>>
>> My team was exploring changes to the HDFS bolts that would allow for
>> partitioning the output, for example into directories corresponding to
>> day.  This is different that the existing functionality to rotate files
>> based on a set length of time.  For unrelated reasons, we are probably not
>> going to pursue this further.  However, I have some code changes that
>> implement most of this functionality for at least some partitioning use
>> cases.  If there is interest from the user or developer community for this
>> feature, I could get in shape for a PR to get feedback about our
>> implementation approach.
>>
>> Any feedback on this idea is welcome.  Thanks! -Aaron
>>
>
>


Re: HDFS Bolts -- partitioning output

2016-01-06 Thread Erik Weathers
hey Aaron,

We've also written a similar bolt at Groupon, we aren't super satisfied
with the implementation though. :)  We are begrudgingly using it because
there is no partitioning support in the OSS storm-hdfs bolt.

Though one thing I do like about our implementation is having the ability
to define your own "Partitioner" in each topology to do various types of
partitioning (date-based, message ID-based, topic-based, whatever).  It
would be great if your implementation had such logic too.  e.g., when
deciding the HDFS path for a tuple's data, the Partitioner is called to
determine the HDFS path.  For example, it can take the Tuple object and an
opaque key/value Configuration hash that can pass items like a kafka topic
name to be included into the HDFS path.

- Erik

On Tue, Dec 29, 2015 at 7:12 AM, Aaron.Dossett 
wrote:

> Hi,
>
> My team was exploring changes to the HDFS bolts that would allow for
> partitioning the output, for example into directories corresponding to
> day.  This is different that the existing functionality to rotate files
> based on a set length of time.  For unrelated reasons, we are probably not
> going to pursue this further.  However, I have some code changes that
> implement most of this functionality for at least some partitioning use
> cases.  If there is interest from the user or developer community for this
> feature, I could get in shape for a PR to get feedback about our
> implementation approach.
>
> Any feedback on this idea is welcome.  Thanks! -Aaron
>


Re: Problem with storm since 4 months

2015-12-10 Thread Erik Weathers
responses inline.

On Thu, Dec 10, 2015 at 4:48 AM, sam mohel <sammoh...@gmail.com> wrote:

> Thanks for your replying and for the link
> the local port range of my machine default is 3276861000 and i changed
> it to 102465535 is that right or not ?
>

Ah, so, you shouldn't have touched the "local port range" setting since you
don't fully understand ephemeral ports and TCP yet! :-)

I'll give a brief synopsis:

Say you're making a connection from a client to a server that listens on
port 80. The client needs to have a port of its own to receive the response
packets from the server. The port that is allocated for it by the TCP stack
is a "random" port, which is called an "ephemeral" port in this context.
So with your original default config (3276861000) the port would be
chosen from an available port on the machine that lies within that range of
32768->61000.  That's good, it wouldn't conflict with the default TCP ports
being listened to by the Storm Worker processes (67xx).  So 1. isn't your
problem.  But now you've made it possible for it to become a problem, since
now the range of ports that can be given as an ephemeral port overlaps with
the default Storm Worker ports (1024->65535 includes 67xx).  So you should
revert that config change.


>
> how can i extend from 16 to 64 ? i searched to know how but didn't get it
>

Sorry, I don't understand what you are asking. What thing is "16" that you
are trying to extend?


> Now i was running storm and submitted topology but electricity felled so
> how can i killed the topology ? am i connect storm again then kill or what
> shoud i do ?
>

You can kill topologies from the Nimbus UI (web page).  Or with the
bin/storm command.


>
> should i try your commands after submitting or before it also ?
>

The commands are *solely* intended to figure out what is conflicting.  Your
logs claim that there is something holding onto 67xx which prevents the
Storm Worker from launching.   So if that is happening you should
*immediately* try to figure out what is actually holding onto the port and
preventing your Storm Worker from launching.


>
> i feel that problem is really with local port but can't catch it
>

Not sure what you mean here.

Please note that Storm often suffers from "cascading" failures, where there
are a lot of exceptions and errors that aren't actually the root cause of
the problem.  Often you need to spend time and effort looking at lots of
logs and tracing back to the real root cause.

- Erik


>
> Really , Thanks for your time
>
> On Thu, Dec 10, 2015 at 6:22 AM, Erik Weathers <
> eweath...@groupon.com.invalid> wrote:
>
> > Regarding Basti's suggestion (1.) that your host's configured ephemeral
> > ports might be conflicting with the storm worker ports, here's how you
> can
> > check your "local port range" setting:
> >
> >
> >
> https://serverfault.com/questions/261663/on-linux-how-can-i-tell-how-many-ephemeral-ports-are-left-available
> >
> > % cat /proc/sys/net/ipv4/ip_local_port_range
> >
> >
> > It's possible that there is a zombie worker process holding onto port
> 6703.
> > I would try to identify the process like so:
> >
> > % sudo netstat -ap --numeric-ports --extend | grep -w LISTEN | grep -w
> 6703
> >
> > Alternatively you can try a global lsof search:
> >
> > % sudo lsof | grep TCP | grep -w LISTEN | grep -w 6703
> >
> > - Erik
> >
> >
> >
> > On Wed, Dec 9, 2015 at 7:37 PM, 刘键(Basti Liu) <basti...@alibaba-inc.com>
> > wrote:
> >
> > > Hi Sam,
> > >
> > > You can try to find which process has bound this port by "netstat -anp"
> > > first.
> > >
> > > Generally, there are following cases for the binding error.
> > > 1. "local port range" is not set to exclude the port range used in
> Storm.
> > > 2. The previous worker was not killed correctly.
> > > 3. There is bug of assignment in some scenarios. Same port was assigned
> > to
> > > two workers.
> > >
> > > Regards
> > > Basti
> > >
> > > -Original Message-
> > > From: sam mohel [mailto:sammoh...@gmail.com]
> > > Sent: Thursday, December 10, 2015 7:16 AM
> > > To: dev@storm.apache.org
> > > Subject: Re: Problem with storm since 4 months
> > >
> > > i tried to use storm-0.9.5 but problem changed with
> > >
> > > cannot bind port 6703 i think it's same problem
> > >
> > > On Wed, Dec 9, 2015 at 8:42 PM, Harsha <st...@harsha.io> wrote:
> > >
> > > > Sam,
> > > > 

Re: Problem with storm since 4 months

2015-12-09 Thread Erik Weathers
Regarding Basti's suggestion (1.) that your host's configured ephemeral
ports might be conflicting with the storm worker ports, here's how you can
check your "local port range" setting:

https://serverfault.com/questions/261663/on-linux-how-can-i-tell-how-many-ephemeral-ports-are-left-available

% cat /proc/sys/net/ipv4/ip_local_port_range


It's possible that there is a zombie worker process holding onto port 6703.
I would try to identify the process like so:

% sudo netstat -ap --numeric-ports --extend | grep -w LISTEN | grep -w 6703

Alternatively you can try a global lsof search:

% sudo lsof | grep TCP | grep -w LISTEN | grep -w 6703

- Erik



On Wed, Dec 9, 2015 at 7:37 PM, 刘键(Basti Liu) 
wrote:

> Hi Sam,
>
> You can try to find which process has bound this port by "netstat -anp"
> first.
>
> Generally, there are following cases for the binding error.
> 1. "local port range" is not set to exclude the port range used in Storm.
> 2. The previous worker was not killed correctly.
> 3. There is bug of assignment in some scenarios. Same port was assigned to
> two workers.
>
> Regards
> Basti
>
> -Original Message-
> From: sam mohel [mailto:sammoh...@gmail.com]
> Sent: Thursday, December 10, 2015 7:16 AM
> To: dev@storm.apache.org
> Subject: Re: Problem with storm since 4 months
>
> i tried to use storm-0.9.5 but problem changed with
>
> cannot bind port 6703 i think it's same problem
>
> On Wed, Dec 9, 2015 at 8:42 PM, Harsha  wrote:
>
> > Sam,
> >   you might be using very old version of storm since its showing
> >   ZeroMQ. Can you try using newer version storm without zero mq.
> > -Harsha
> >
> > On Wed, Dec 9, 2015, at 10:19 AM, sam mohel wrote:
> > > I have this problem since 4months when I submitted topology I got
> > > this in the worker log file [ERROR] Async loop died!
> org.zeromq.ZMQException:
> > > Address already in use(0x62)
> > > at org.zeromq.ZMQ$Socket.bind(Native Method) at
> > > zilch.mq$bind.invoke(mq.clj:69) at
> > > backtype.storm.messaging.zmq.ZMQContext.bind(zmq.clj:57)at
> > >
> > backtype.storm.messaging.loader$launch_receive_thread_BANG_$fn__1629.i
> > nvoke(loader.clj:26)
> > > at backtype.storm.util$async_loop$fn__465.invoke(util.clj:375)
> > > at clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Unknown
> > > Source)
> > >
> > > when i tried to connect port 6703 and 6702
> > >
> > > And supervisor log file hadn't still start
> > >
> > >
> > > I searched everywhere but cannot find any solution I hope you can
> >
>
>


[jira] [Updated] (STORM-1342) support multiple logviewers per host for container-isolated worker logs

2015-11-23 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1342:
-
Description: 
h3. Storm-on-Mesos Worker Logs are in varying directories
When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each 
topology's workers are isolated into separate containers.  By default the 
worker logs will be saved into container-specific sandbox directories.  These 
directories are also topology-specific by definition, because, as just stated, 
the containers are specific to each topology.

h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
A challenge with this different way of running Storm is that the [Storm 
logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui]
 runs as a single instance on each worker host.   This doesn't play well with 
having the topology worker logs in separate per-topology containers.  The one 
logviewer doesn't know about the various sandbox directories that the Storm 
Workers are writing to.  And if we just spawned new logviewers for each 
container, the problem is that the Storm UI only knows about 1 global port the 
logviewer, so you cannot just direct.

These problems are documented (or linked to) from [Issue #6 in the 
storm-on-mesos project|https://github.com/mesos/storm/issues/6]

h3. Possible Solutions I can envision
# configure the Storm workers to write to log directories that exist on the raw 
host outside of the container sandbox, and run a single logviewer on a host, 
which serves up the contents of that directory.
#* violates one of the basic reasons for using containers: isolation.
#* also prevents allow a standard use case for Mesos: running more than 1 
instance of a Mesos Framework (e.g., "Storm Cluster") at once on same Mesos 
Cluster. e.g., for Blue-Green deployments.
#* a variation on this proposal is to somehow expose the sandbox dirs of all 
storm containers to this singleton logviewer process (still has above problems)
# launch a separate logviewers in each container, and somehow register those 
logviewers with Storm such that Storm knows for a given host which logviewer 
port is assigned to a given topology.
#* this is the proposed solution

h3. Storm Changes for the Proposed Solution

Nimbus or ZooKeeper could serve as a registrar, recording the association 
between a slot (host + worker port) and the logviewer port that is serving the 
workers logs. And the Storm-on-Mesos framework could update this registry when 
launching a new worker.  (This proposal definitely calls for thorough vetting 
and thinking.)

h3. Storm-on-Mesos Framework Changes for the Proposed Solution

Along with the interaction with the "registrar" proposed above, the 
storm-on-mesos framework can be enhanced to launch multiple logviewers on a 
given worker host, where each logviewer is dedicated to serving the worker logs 
from a specific topology's container/sandbox directory.  This would be done by 
launching a logviewer process within the topology's container, and assigning it 
an arbitrary listening port that has been determined dynamically through mesos 
(which treats ports as one of the schedulable resource primitives of a worker 
host).  [Code implementing this logviewer-port-allocation logic already 
exists|https://github.com/mesos/storm/commit/af8c49beac04b530c33c1401c829caaa8e368a35],
 but [that specific portion of the code was 
reverted|https://github.com/mesos/storm/commit/dc3eee0f0e9c06f6da7b2fe697a8e4fc05b5227e]
 because of the issues that inspired this ticket.

  was:
h3. Storm-on-Mesos Worker Logs are in varying directories
When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each 
topology's workers are isolated into separate containers.  By default the 
worker logs will be saved into container-specific sandbox directories.  These 
directories are also topology-specific by definition, because, as just stated, 
the containers are specific to each topology.

h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
A challenge with this different way of running Storm is that the [Storm 
logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui]
 runs as a single instance on each worker host.   This doesn't play well with 
having the topology worker logs in separate per-topology containers.  The one 
logviewer doesn't know about the various sandbox directories that the Storm 
Workers are writing to.  And if we just spawned new logviewers for each 
container, the problem is that the Storm UI only knows about 1 global port the 
logviewer, so you cannot just direct.

h3. Possible Solutions I can envision
# configure the Storm workers to write to log directories that exist on the raw 
host outs

[jira] [Created] (STORM-1342) support multiple logviewers per host for container-isolated worker logs

2015-11-23 Thread Erik Weathers (JIRA)
Erik Weathers created STORM-1342:


 Summary: support multiple logviewers per host for 
container-isolated worker logs
 Key: STORM-1342
 URL: https://issues.apache.org/jira/browse/STORM-1342
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-core
Reporter: Erik Weathers
Priority: Minor


h3. Storm-on-Mesos Worker Logs are in varying directories
When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each 
topology's workers are isolated into separate containers.  By default the 
worker logs will be saved into container-specific sandbox directories.  These 
directories are also topology-specific by definition, because, as just stated, 
the containers are specific to each topology.

h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
A challenge with this different way of running Storm is that the [Storm 
logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui]
 runs as a single instance on each worker host.   This doesn't play well with 
having the topology worker logs in separate per-topology containers.  The one 
logviewer doesn't know about the various sandbox directories that the Storm 
Workers are writing to.  And if we just spawned new logviewers for each 
container, the problem is that the Storm UI only knows about 1 global port the 
logviewer, so you cannot just direct.

h3. Possible Solutions I can envision
# configure the Storm workers to write to log directories that exist on the raw 
host outside of the container sandbox, and run a single logviewer on a host, 
which serves up the contents of that directory.
#* violates one of the basic reasons for using containers: isolation.
#* also prevents allow a standard use case for Mesos: running more than 1 
instance of a Mesos Framework (e.g., "Storm Cluster") at once on same Mesos 
Cluster. e.g., for Blue-Green deployments.
#* a variation on this proposal is to somehow expose the sandbox dirs of all 
storm containers to this singleton logviewer process (still has above problems)
# launch a separate logviewers in each container, and somehow register those 
logviewers with Storm such that Storm knows for a given host which logviewer 
port is assigned to a given topology.
#* this is the proposed solution

h3. Storm Changes for the Proposed Solution

Nimbus or ZooKeeper could serve as a registrar, recording the association 
between a slot (host + worker port) and the logviewer port that is serving the 
workers logs. And the Storm-on-Mesos framework could update this registry when 
launching a new worker.  (This proposal definitely calls for thorough vetting 
and thhinking.)

h3. Storm-on-Mesos Framework Changes for the Proposed Solution

Along with the interaction with the "registrar" proposed above, the 
storm-on-mesos framework can be enhanced to launch multiple logviewers on a 
given worker host, where each logviewer is dedicated to serving the worker logs 
from a specific topology's container/sandbox directory.  This would be done by 
launching a logviewer process within the topology's container, and assigning it 
an arbitrary listening port that has been determined dynamically through mesos 
(which treats ports as one of the schedulable resource primitives of a worker 
host).  [Code implementing this logviewer-port-allocation logic already 
exists|https://github.com/mesos/storm/commit/af8c49beac04b530c33c1401c829caaa8e368a35],
 but [that specific portion of the code was 
reverted|https://github.com/mesos/storm/commit/dc3eee0f0e9c06f6da7b2fe697a8e4fc05b5227e]
 because of the issues that inspired this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-1342) support multiple logviewers per host for container-isolated worker logs

2015-11-23 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1342:
-
Description: 
h3. Storm-on-Mesos Worker Logs are in varying directories
When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each 
topology's workers are isolated into separate containers.  By default the 
worker logs will be saved into container-specific sandbox directories.  These 
directories are also topology-specific by definition, because, as just stated, 
the containers are specific to each topology.

h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
A challenge with this different way of running Storm is that the [Storm 
logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui]
 runs as a single instance on each worker host.   This doesn't play well with 
having the topology worker logs in separate per-topology containers.  The one 
logviewer doesn't know about the various sandbox directories that the Storm 
Workers are writing to.  And if we just spawned new logviewers for each 
container, the problem is that the Storm UI only knows about 1 global port the 
logviewer, so you cannot just direct.

h3. Possible Solutions I can envision
# configure the Storm workers to write to log directories that exist on the raw 
host outside of the container sandbox, and run a single logviewer on a host, 
which serves up the contents of that directory.
#* violates one of the basic reasons for using containers: isolation.
#* also prevents allow a standard use case for Mesos: running more than 1 
instance of a Mesos Framework (e.g., "Storm Cluster") at once on same Mesos 
Cluster. e.g., for Blue-Green deployments.
#* a variation on this proposal is to somehow expose the sandbox dirs of all 
storm containers to this singleton logviewer process (still has above problems)
# launch a separate logviewers in each container, and somehow register those 
logviewers with Storm such that Storm knows for a given host which logviewer 
port is assigned to a given topology.
#* this is the proposed solution

h3. Storm Changes for the Proposed Solution

Nimbus or ZooKeeper could serve as a registrar, recording the association 
between a slot (host + worker port) and the logviewer port that is serving the 
workers logs. And the Storm-on-Mesos framework could update this registry when 
launching a new worker.  (This proposal definitely calls for thorough vetting 
and thinking.)

h3. Storm-on-Mesos Framework Changes for the Proposed Solution

Along with the interaction with the "registrar" proposed above, the 
storm-on-mesos framework can be enhanced to launch multiple logviewers on a 
given worker host, where each logviewer is dedicated to serving the worker logs 
from a specific topology's container/sandbox directory.  This would be done by 
launching a logviewer process within the topology's container, and assigning it 
an arbitrary listening port that has been determined dynamically through mesos 
(which treats ports as one of the schedulable resource primitives of a worker 
host).  [Code implementing this logviewer-port-allocation logic already 
exists|https://github.com/mesos/storm/commit/af8c49beac04b530c33c1401c829caaa8e368a35],
 but [that specific portion of the code was 
reverted|https://github.com/mesos/storm/commit/dc3eee0f0e9c06f6da7b2fe697a8e4fc05b5227e]
 because of the issues that inspired this ticket.

  was:
h3. Storm-on-Mesos Worker Logs are in varying directories
When using [storm-on-mesos|https://github.com/mesos/storm] with cgroups, each 
topology's workers are isolated into separate containers.  By default the 
worker logs will be saved into container-specific sandbox directories.  These 
directories are also topology-specific by definition, because, as just stated, 
the containers are specific to each topology.

h3. Problem: Storm supports 1-and-only-1 Logviewer per Worker Host
A challenge with this different way of running Storm is that the [Storm 
logviewer|https://github.com/apache/storm/blob/768a85926373355c15cc139fd86268916abc6850/docs/_posts/2013-12-08-storm090-released.md#log-viewer-ui]
 runs as a single instance on each worker host.   This doesn't play well with 
having the topology worker logs in separate per-topology containers.  The one 
logviewer doesn't know about the various sandbox directories that the Storm 
Workers are writing to.  And if we just spawned new logviewers for each 
container, the problem is that the Storm UI only knows about 1 global port the 
logviewer, so you cannot just direct.

h3. Possible Solutions I can envision
# configure the Storm workers to write to log directories that exist on the raw 
host outside of the container sandbox, and run a single logviewer on a host, 
which serves up the contents of that directory.
#* violates one 

[jira] [Created] (STORM-1216) button to kill all topologies in Storm UI

2015-11-17 Thread Erik Weathers (JIRA)
Erik Weathers created STORM-1216:


 Summary: button to kill all topologies in Storm UI
 Key: STORM-1216
 URL: https://issues.apache.org/jira/browse/STORM-1216
 Project: Apache Storm
  Issue Type: Wish
  Components: storm-core
Affects Versions: 0.11.0
Reporter: Erik Weathers
Priority: Minor


In the Storm-on-Mesos project we had a [request to have an ability to "shut 
down the storm cluster" via a UI 
button|https://github.com/mesos/storm/issues/46].   That could be accomplished 
via a button in the Storm UI to kill all of the topologies.

I understand if this is viewed as an undesirable feature, but I just wanted to 
document the request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-1027) Topology may hang because metric-tick function is a blocking call from spout

2015-11-09 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1027:
-
Affects Version/s: 0.9.5
Fix Version/s: 0.9.6

> Topology may hang because metric-tick function is a blocking call from spout
> 
>
> Key: STORM-1027
> URL: https://issues.apache.org/jira/browse/STORM-1027
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 0.10.0, 0.9.5
>Reporter: Abhishek Agarwal
>Assignee: Abhishek Agarwal
>Priority: Critical
> Fix For: 0.10.0, 0.9.6
>
>
> Nathan had fixed the dining philosopher problem by putting a overflow buffer 
> in the spout so that spout is not blocking. However, overflow buffer is not 
> used when emitting the metric, and that could result in the deadlock. I 
> modified the executor to use overflow buffer for emitting metrics and 
> afterwards topology didn't hang. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1056) allow supervisor log filename to be configurable via ENV variable

2015-11-06 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14994381#comment-14994381
 ] 

Erik Weathers commented on STORM-1056:
--

[~kabhwan]: seems this didn't get put into the 0.10.0 release as I had 
expected. :-(

Can you please ensure it's in the train for 0.10.1?

> allow supervisor log filename to be configurable via ENV variable
> -
>
> Key: STORM-1056
> URL: https://issues.apache.org/jira/browse/STORM-1056
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-core
>Reporter: Erik Weathers
>    Assignee: Erik Weathers
>Priority: Minor
> Fix For: 0.9.6
>
>
> *Requested feature:*  allow configuring the supervisor's log filename when 
> launching it via an ENV variable.
> *Motivation:* The storm-on-mesos project (https://github.com/mesos/storm) 
> relies on multiple Storm Supervisor processes per worker host, where each 
> Supervisor is dedicated to a particular topology.  This is part of the 
> framework's functionality of separating topologies from each other.  i.e., 
> storm-on-mesos is a multi-tenant system.  But before the change requested in 
> this issue, the logs from all supervisors on a worker host will be written 
> into a supervisor log with a single name of supervisor.log.  If all logs are 
> written to a common location on the mesos host, then all logs go to the same 
> log file.  Instead it would be desirable to separate the supervisor logs 
> per-topology, so that each tenant/topology-owner can peruse the logs that are 
> related to their own topology.  Thus this ticket is requesting the ability to 
> configure the supervisor log via an environment variable whilst invoking 
> bin/storm.py (or bin/storm in pre-0.10 storm releases).
> When this ticket is fixed, we will include the topology ID into the 
> supervisor log filename for storm-on-mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Java 8 status with Storm 0.9.5

2015-10-01 Thread Erik Weathers
We've been using Storm 0.9.5 on JRE 8 for a few months without issue.
Last night I sent a long expository response on how you can figure out the
targeted version for Storm's class files.

List: u...@storm.apache.org
Subject: Storm dependency on JVM version

- Erik

On Thu, Oct 1, 2015 at 8:22 AM, Bobby Evans 
wrote:

> We have been running 0.9.2 in production on java 8 for several months.  I
> don't know about 0.9.5 though.
>  - Bobby
>
>
>  On Thursday, October 1, 2015 8:58 AM, Mike Thomsen <
> mikerthom...@gmail.com> wrote:
>
>
>  How safe is it to use Java 8 with Storm 0.9.5?
>
> Thanks,
>
> Mike
>
>
>
>


[jira] [Updated] (STORM-763) nimbus reassigned worker A to another machine, but other worker's netty client can't connect to the new worker A

2015-09-22 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-763:

Description: 
Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
java version "1.7.0_03"
storm 0.9.4
cluster 50+ machines

my topology have 50+ worker, it can't emit  5 thousand tuples in ten 
minutes.
sometimes one worker is reassigned to another machine by nimbus because of task 
heartbeat timeout:
{code}
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[440 440] not alive
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[90 90] not alive
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[510 510] not alive
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[160 160] not alive
{code}

i can see the reassigned worker is already started in storm UI,  but  other 
worker write error log all the time:
{code}
2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.835+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
{code}

The worker of destined host is already started, and i can telnet 192.168.163.19 
5700.
however, why the netty client can't connect to the ip:port?

  was:
Debian 3.16.3-2~bpo70+1 (2014-09-21) x86_64 GNU/Linux
java version "1.7.0_03"
storm 0.9.4
cluster 50+ machines

my topology have 50+ worker, it can't emit  5 thousand tuples in ten 
minutes.
sometimes one worker is reassigned to another machine by nimbus because of task 
heartbeat timeout:
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[440 440] not alive
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[90 90] not alive
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[510 510] not alive
2015-04-08T16:51:23.026+0800 b.s.d.nimbus [INFO] Executor 
my_topology-22-1428243953:[160 160] not alive

i can see the reassigned worker is already started in storm UI,  but  other 
worker write error log all the time:
2015-04-08T16:56:43.091+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:45.660+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:45.715+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:45.716+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.277+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:46.278+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:46.306+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] connection to 
Netty-Client-host_19/192.168.163.19:5700 is unavailable
2015-04-08T16:56:46.586+0800 b.s.m.n.Client [ERROR] dropping 1 message(s) 
destined for Netty-Client-host_19/192.168.163.19:5700
2015-04-08T16:56:46.835

Re: does anyone else hate the verbose logging of all PR comments in the Storm JIRAs?

2015-09-21 Thread Erik Weathers
Sure, STORM-*.  ;-)

Here's a good example:

   - https://issues.apache.org/jira/browse/STORM-329

Compare that to this one:

   - https://issues.apache.org/jira/browse/STORM-404

STORM-404 has a bunch of human-created comments, but it's readable since it
has no github-generated comments.  STORM-329 however intermixes the human
comments with the github ones.  It's really hard to read through.

To be clear, it's not that it's *confusing* per se -- it's that the
behavior is *cluttering* the comments, making it harder to see any
human-created comments since any JIRA issue with a PR will usually end up
with many automated comments.

BTW, I totally agree that linking from the JIRA issue to the github PR is
important!  Would be even nicer if the github PRs also directly linked back
to the JIRA issue with a clickable link.

- Erik

On Mon, Sep 21, 2015 at 6:03 PM, 임정택 <kabh...@gmail.com> wrote:

> Hi Erik,
>
> I think verbose logging of PR comments could be OK. I didn't experience any
> confusing.
> Maybe referring sample JIRA issues could help us to understand.
>
> But I'm also open to change cause other projects already have been doing.
> (for example, https://issues.apache.org/jira/browse/SPARK-10474)
>
> In addition to SPARK has been doing, I'd like to still leave some events on
> github PR to JIRA issue, too.
>
> Btw, the thing I'm really annoyed is multiple mail notifications on each
> github comment.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2015-09-22 9:15 GMT+09:00 Erik Weathers <eweath...@groupon.com>:
>
> > I find that these comments majorly distract from any discussion that may
> > occur in the JIRA issues themselves.   What value are these providing?  I
> > guess just insurance against GitHub being unavailable or going away?  But
> > that doesn't seem worth the distraction cost.  Is there any possibility
> of
> > removing this spamminess, or somehow putting them into attachments within
> > the JIRA issues so that they aren't directly in the comments?
> >
> > - Erik
> >
>
>
>
> --
> Name : 임 정택
> Blog : http://www.heartsavior.net / http://dev.heartsavior.net
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>


backporting minor feature for storm-on-mesos logging to 0.9.x train?

2015-09-21 Thread Erik Weathers
hi dev list,

We recently merged in a change which allows for better logging in
storm-on-mesos (https://github.com/mesos/storm):

   - https://github.com/apache/storm/pull/733

This fix should go out in release 0.10.0.

However I'd like to get it backported to the 0.9.x release train as well.
I realize that this runs counter to the convention that only bug fixes
should go into the 0.9.x train.  My argument is that the change is
particularly minor and has marked benefit for storm-on-mesos.  So I'm
lobbying to get this change backported into 0.9.x.  HeartSaVioR has kindly
reviewed this proposal and is open to making this one time exception.
However he wants buy-in from at least one other PMC member before allowing
it to happen:

   - https://github.com/apache/storm/pull/733#issuecomment-142127363

So this email is an attempt to solicit opinion(s) about making this
exception.

Thanks!

- Erik


does anyone else hate the verbose logging of all PR comments in the Storm JIRAs?

2015-09-21 Thread Erik Weathers
I find that these comments majorly distract from any discussion that may
occur in the JIRA issues themselves.   What value are these providing?  I
guess just insurance against GitHub being unavailable or going away?  But
that doesn't seem worth the distraction cost.  Is there any possibility of
removing this spamminess, or somehow putting them into attachments within
the JIRA issues so that they aren't directly in the comments?

- Erik


[jira] [Updated] (STORM-107) Add better ways to construct topologies

2015-09-21 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-107:

Description: 
https://github.com/nathanmarz/storm/issues/649

AFAIK the only way to construct a topology is to manually wire them together, 
e.g.

{code}
  (topology
   {"firehose" (spout-spec firehose-spout)}
   {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
some-bolt
:p 5)
"our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
 some-other-bolt
 :p 6)})
{code}

This sort of manual specification of edges seems a bit too 1990's for me. I 
would like a modular way to express topologies, so that you can compose 
sub-topologies together. Another benefit of an alternative to this graph setup 
is that ensuring that the topology is correct does not mean tracing every edge 
in the graph to make sure the graph is right.

I am thinking maybe some sort of LINQ-style query that simply desugars to the 
arguments we pass into topology.

For example, the following could desugar into the two map arguments we're 
passing to topology:

{code}
(def firehose (mk-spout "firehose" firehose-spout))
(def bolt1 (mk-bolt "our-bolt-1" some-bolt :p 5))
(def bolt2 (mk-bolt "our-bolt-1" some-other-bolt :p 6))

(from-in thing (compose firehose
bolt1
bolt2)
  (select thing))
{code}

Here from-in is pulling thing out of the result of compose'ing the firehose and 
the bolts, forming the topology we saw before. mk-spout would register a named 
spout spec, and the from macro would return the two dictionaries passed into 
topology.

The specification needs a lot of work, but I'm willing to write the patch 
myself once it's nailed down. The question is, do you want me to write it and 
send it off to you, or am I going to have to build a storm-tools repot to 
distribute it?


--
mrflip:We have an internal tool for describing topologies at a high level, and 
though it hasn't reached production we have found:
1. it definitely makes sense to have one set of objects that describe 
topologies, and a different set of objects that express them. 
2. it probably makes sense to have those classes generate a static manifest: a 
lifeless JSON representation of a topology.

To the first point, initially we did it like storm: the FooEacher class would 
know how to wire itself into a topology(), and also know how to Foo each record 
that it received. We later refactored to separate topology construction from 
data handling: there is an EacherStage that represents anything that obeys the 
Each contract, so you'd say flow do source(:kafka_trident_spout) > 
eacher(:foo_eacher) > so_on() > and_so_forth(). The code became simpler and 
more powerful.
() Actually in storm stages are wired into the topology, but the issue is that 
they're around at run-time in both cases, requiring serialization and so forth.

More importantly, it's worth considering a static manifest.

The virtue of a manifest is that it is universal and static. If it's a JSON 
file, anything can generate it and anything can consume it; that would meet the 
needs of external programs which want to orchestrate Storm/Trident, as well as 
the repeated requests to visualize a topology in the UI. Also since it's 
static, the worker logic can simplify as it will know the whole graph in 
advance. From my experience, apart from the transactional code, the topology 
instantiation logic is the most complicated in the joint. That feels 
justifiable for the transaction logic but not for the topology instantiation.

The danger of a manifest is also that it is static -- you could find yourself 
on the primrose path to maven-style XML hell, where you wake up one day and 
find you've attached layers of ponderous machinery to make a static config file 
Turing-complete. I think the problem comes when you try to make the file 
human-editable. The manifest should expressly be the porcelain result of a DSL, 
with all decisions baked in -- it must not be a DSL.

In general, we find that absolute separation of orchestration (what things 
should be wired together) and action (actually doing things) seems painful at 
design time but ends up making things simpler and more powerful.

  was:
https://github.com/nathanmarz/storm/issues/649

AFAIK the only way to construct a topology is to manually wire them together, 
e.g.

  (topology
   {"firehose" (spout-spec firehose-spout)}
   {"our-bolt-1" (bolt-spec {"firehose" :shuffle}
some-bolt
:p 5)
"our-bolt-2" (bolt-spec {"our-bolt-1" ["word"]}
 some-other-bolt
   

[jira] [Created] (STORM-1056) allow supervisor log filename to be configurable via ENV variable

2015-09-20 Thread Erik Weathers (JIRA)
Erik Weathers created STORM-1056:


 Summary: allow supervisor log filename to be configurable via ENV 
variable
 Key: STORM-1056
 URL: https://issues.apache.org/jira/browse/STORM-1056
 Project: Apache Storm
  Issue Type: Task
Reporter: Erik Weathers
Priority: Minor
 Fix For: 0.10.0, 0.11.0, 0.9.6


Requested feature:  allow configuring the supervisor's log filename when 
launching it via an ENV variable.

Motivation: The storm-on-mesos project (https://github.com/mesos/storm) relies 
on multiple Storm Supervisor processes per worker host, where each Supervisor 
is dedicated to a particular topology.  This is part of the framework's 
functionality of separating topologies from each other.  i.e., storm-on-mesos 
is a multi-tenant system.  But before the change requested in this issue, the 
logs from all supervisors on a worker host will be written into a supervisor 
log with a single name of supervisor.log.  If all logs are written to a common 
location on the mesos host, then all logs go to the same log file.  Instead it 
would be desirable to separate the supervisor logs per-topology, so that each 
tenant/topology-owner can peruse the logs that are related to their own 
topology.  Thus this ticket is requesting the ability to configure the 
supervisor log via an environment variable whilst invoking bin/storm.py (or 
bin/storm in pre-0.10 storm releases).

When this ticket is fixed, we will include the topology ID into the supervisor 
log filename for storm-on-mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-1056) allow supervisor log filename to be configurable via ENV variable

2015-09-20 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1056:
-
Description: 
*Requested feature:*  allow configuring the supervisor's log filename when 
launching it via an ENV variable.

*Motivation:* The storm-on-mesos project (https://github.com/mesos/storm) 
relies on multiple Storm Supervisor processes per worker host, where each 
Supervisor is dedicated to a particular topology.  This is part of the 
framework's functionality of separating topologies from each other.  i.e., 
storm-on-mesos is a multi-tenant system.  But before the change requested in 
this issue, the logs from all supervisors on a worker host will be written into 
a supervisor log with a single name of supervisor.log.  If all logs are written 
to a common location on the mesos host, then all logs go to the same log file.  
Instead it would be desirable to separate the supervisor logs per-topology, so 
that each tenant/topology-owner can peruse the logs that are related to their 
own topology.  Thus this ticket is requesting the ability to configure the 
supervisor log via an environment variable whilst invoking bin/storm.py (or 
bin/storm in pre-0.10 storm releases).

When this ticket is fixed, we will include the topology ID into the supervisor 
log filename for storm-on-mesos.

  was:
Requested feature:  allow configuring the supervisor's log filename when 
launching it via an ENV variable.

Motivation: The storm-on-mesos project (https://github.com/mesos/storm) relies 
on multiple Storm Supervisor processes per worker host, where each Supervisor 
is dedicated to a particular topology.  This is part of the framework's 
functionality of separating topologies from each other.  i.e., storm-on-mesos 
is a multi-tenant system.  But before the change requested in this issue, the 
logs from all supervisors on a worker host will be written into a supervisor 
log with a single name of supervisor.log.  If all logs are written to a common 
location on the mesos host, then all logs go to the same log file.  Instead it 
would be desirable to separate the supervisor logs per-topology, so that each 
tenant/topology-owner can peruse the logs that are related to their own 
topology.  Thus this ticket is requesting the ability to configure the 
supervisor log via an environment variable whilst invoking bin/storm.py (or 
bin/storm in pre-0.10 storm releases).

When this ticket is fixed, we will include the topology ID into the supervisor 
log filename for storm-on-mesos.


> allow supervisor log filename to be configurable via ENV variable
> -
>
> Key: STORM-1056
> URL: https://issues.apache.org/jira/browse/STORM-1056
> Project: Apache Storm
>  Issue Type: Task
>    Reporter: Erik Weathers
>Priority: Minor
> Fix For: 0.10.0, 0.11.0, 0.9.6
>
>
> *Requested feature:*  allow configuring the supervisor's log filename when 
> launching it via an ENV variable.
> *Motivation:* The storm-on-mesos project (https://github.com/mesos/storm) 
> relies on multiple Storm Supervisor processes per worker host, where each 
> Supervisor is dedicated to a particular topology.  This is part of the 
> framework's functionality of separating topologies from each other.  i.e., 
> storm-on-mesos is a multi-tenant system.  But before the change requested in 
> this issue, the logs from all supervisors on a worker host will be written 
> into a supervisor log with a single name of supervisor.log.  If all logs are 
> written to a common location on the mesos host, then all logs go to the same 
> log file.  Instead it would be desirable to separate the supervisor logs 
> per-topology, so that each tenant/topology-owner can peruse the logs that are 
> related to their own topology.  Thus this ticket is requesting the ability to 
> configure the supervisor log via an environment variable whilst invoking 
> bin/storm.py (or bin/storm in pre-0.10 storm releases).
> When this ticket is fixed, we will include the topology ID into the 
> supervisor log filename for storm-on-mesos.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (STORM-1047) document internals of bin/storm.py

2015-09-15 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-1047:
-
Description: 
The `python` script `bin/storm.py` is completely undocumented regarding its 
internals. Function comments only include a command line interface often 
omitting an explanation of arguments and their default values (e.g. it should 
be clear why the default value of `klass` of `nimbus` is 
`"backtype.storm.daemon.nimbus"` because that doesn't make sense to someone 
unfamiliar with the storm-core implementation).

Also explanations like "Launches the nimbus daemon. [...]" (again `nimbus` 
function) is good for a command line API doc, but insufficient for a function 
documentation (should mention that it starts a `java` process and passes 
`klass` as class name to it).

How does the script use `lib/`, `extlib/` and `extlib-daemon`? It's too complex 
to squeeze this info out of the source code.

  was:
The `python` script `bin/storm.py` is completely undocumented regarding it's 
internals. Function comments only include a command line interface often 
omitting an explanation of arguments and their default values (e.g. it should 
be clear why the default value of `klass` of `nimbus` is 
`"backtype.storm.daemon.nimbus"` because that doesn't make sense).

Also explanations like "Launches the nimbus daemon. [...]" (again `nimbus` 
function) is good for a command line API doc, but insufficient for a function 
documentation (should mention that it starts a `java` process and passes 
`klass` as class name to it).

How does the script use `lib/`, `extlib/` and `extlib-daemon`? It's too complex 
to squeeze this info out of the source code.


> document internals of bin/storm.py
> --
>
> Key: STORM-1047
> URL: https://issues.apache.org/jira/browse/STORM-1047
> Project: Apache Storm
>  Issue Type: Documentation
>Affects Versions: 0.10.0
>Reporter: Karl Richter
>  Labels: documentation
>
> The `python` script `bin/storm.py` is completely undocumented regarding its 
> internals. Function comments only include a command line interface often 
> omitting an explanation of arguments and their default values (e.g. it should 
> be clear why the default value of `klass` of `nimbus` is 
> `"backtype.storm.daemon.nimbus"` because that doesn't make sense to someone 
> unfamiliar with the storm-core implementation).
> Also explanations like "Launches the nimbus daemon. [...]" (again `nimbus` 
> function) is good for a command line API doc, but insufficient for a function 
> documentation (should mention that it starts a `java` process and passes 
> `klass` as class name to it).
> How does the script use `lib/`, `extlib/` and `extlib-daemon`? It's too 
> complex to squeeze this info out of the source code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: worker-launcher - there is now native code included in Storm?

2015-09-15 Thread Erik Weathers
Some of the docs were hard for me to unravel (e.g., the section on
supervisor needing a headless user).  But I tried my best!

https://github.com/apache/storm/pull/735

- Erik

On Mon, Sep 14, 2015 at 11:21 PM, Erik Weathers <eweath...@groupon.com>
wrote:

> Thanks for the response Harsha!
>
> Here's the direct link to the section you mentioned:
>
>
> https://github.com/apache/storm/blob/master/SECURITY.md#run-worker-processes-as-user-who-submitted-the-topology
>
> There are a bunch of typos in this SECURITY.md file, I'll see if I can
> clean some of them up.
>
> - Erik
>
> On Mon, Sep 14, 2015 at 10:03 PM, Harsha <st...@harsha.io> wrote:
>
>> Hi Erik,
>>   yes native code is included as part of distro but this is only
>>   used if you enable security and set
>>   supervisor.run.worker.as.users otherwise everything else
>>   remains the same like before. For more info you can take a
>>   look at this doc
>> https://github.com/apache/storm/blob/master/SECURITY.md
>>
>> -Harsha
>>
>> On Mon, Sep 14, 2015, at 03:10 PM, Erik Weathers wrote:
>> > It seems that in storm 0.10 we have native code (C) that is being used
>> > for
>> > launching workers.  I haven't found any reference to how this might
>> > affect
>> > deployment, testing, etc.  Is there any documentation about this new
>> > requirement?  I know it's related to the "launch storm under different
>> > usernames" feature.  But is the native stuff optional?
>> >
>> > - Erik
>>
>
>


Re: Storm multi-tenancy in 0.10 - is there any topology resource isolation?

2015-09-15 Thread Erik Weathers
>From reading the SECURITY.md file in the storm code, it seems the answer is
that the "resource limits" feature is not what I pictured.  The Storm
0.10 "resource
limiting" functionality provided by the MultitenantScheduler
<https://github.com/apache/storm/blob/master/SECURITY.md#multi-tenant-scheduler>
is
similar to the IsolationScheduler of Storm 0.8.2
<https://storm.apache.org/2013/01/11/storm082-released.html>, which allowed
you to isolate the set of hosts that a topology would run on. In Storm 0.10
we will be able to use the MultitenantScheduler to control "the maximum
number of nodes a user is guaranteed to be able to use for their
topologies" (though I'm not 100% sure what that actually means).

On a related note, there is also a feature in Storm 0.10 to limit the
maximum number of worker & executors that *any* topology can have
<https://github.com/apache/storm/blob/master/SECURITY.md#limits>.

Neither of these features provide the process-level resource isolation that
storm-on-mesos <https://github.com/mesos/storm> provides, nor what JStorm
seems to have implemented
<https://github.com/alibaba/jstorm/wiki/Resource-isolation>.   As a
maintainer of storm-on-mesos I was curious if the functionality of that
framework had been subsumed into Storm proper -- not yet is the answer!

- Erik

On Mon, Sep 14, 2015 at 3:04 PM, Erik Weathers <eweath...@groupon.com>
wrote:

> hi Storm Devs,
>
> After reading the release notes for 0.10-beta, I was unsure if the
> multi-tenancy feature will isolate topologies from one another in terms of
> their process resources (CPU, memory):
>
>- https://storm.apache.org/2015/06/15/storm0100-beta-released.html
>- Note: there is a reference to: "configurable resource limits."
>
> To understand how this is implemented, I tried searching through the code
> (both 0.10.x-branch and master), but couldn't find any reference to
> "cgroup".  So how is the resource isolation being done?
>
> Maybe this release note is referencing the "R-storm" work?  That still
> isn't isolating/limiting resources as far as I have been able to discern,
> it's instead focused on scheduling.  Also it doesn't seem to be fixed in
> 0.10 already, so I'm doubtful of it being the answer to my question.
>
>- http://web.engr.illinois.edu/~bpeng/files/r-storm.pdf
>- https://issues.apache.org/jira/browse/STORM-893
>
> Thanks!
>
> - Erik
>


Re: worker-launcher - there is now native code included in Storm?

2015-09-15 Thread Erik Weathers
Thanks for the response Harsha!

Here's the direct link to the section you mentioned:

https://github.com/apache/storm/blob/master/SECURITY.md#run-worker-processes-as-user-who-submitted-the-topology

There are a bunch of typos in this SECURITY.md file, I'll see if I can
clean some of them up.

- Erik

On Mon, Sep 14, 2015 at 10:03 PM, Harsha <st...@harsha.io> wrote:

> Hi Erik,
>   yes native code is included as part of distro but this is only
>   used if you enable security and set
>   supervisor.run.worker.as.users otherwise everything else
>   remains the same like before. For more info you can take a
>   look at this doc
> https://github.com/apache/storm/blob/master/SECURITY.md
>
> -Harsha
>
> On Mon, Sep 14, 2015, at 03:10 PM, Erik Weathers wrote:
> > It seems that in storm 0.10 we have native code (C) that is being used
> > for
> > launching workers.  I haven't found any reference to how this might
> > affect
> > deployment, testing, etc.  Is there any documentation about this new
> > requirement?  I know it's related to the "launch storm under different
> > usernames" feature.  But is the native stuff optional?
> >
> > - Erik
>


[jira] [Reopened] (STORM-1043) Concurrent access to state on local FS by multiple supervisors

2015-09-14 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers reopened STORM-1043:
--
  Assignee: Erik Weathers

> Concurrent access to state on local FS by multiple supervisors
> --
>
> Key: STORM-1043
> URL: https://issues.apache.org/jira/browse/STORM-1043
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 0.9.5
>Reporter: Ernestas Vaiciukevičius
>    Assignee: Erik Weathers
>  Labels: mesosphere
>
> Hi,
> we are running storm-mesos cluster and occassionaly workers die or are "lost" 
> in mesos. When this happens it often coincides with errors in logs related to 
> supervisors local state.
> By looking at the storm code it seems this might be caused by the way how 
> multiple supervisor processes access the local state in the same directory 
> via VersionedStore.
> For example: 
> https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L434
> Here every supervisor does this concurrently:
> 1. reads latest state from FS
> 2. possibly updates the state
> 3. writes the new version of the state
> Some updates could be lost if there are 2+ supervisors and they execute above 
> steps concurrently - then only the updates from last supervisor would remain 
> on the last state version on the disk.
> We observed local state changes quite often (seconds), so the likelihood of 
> this concurrency issue occurring is high.
> Some examples of exeptions:
> --
> java.lang.RuntimeException: Version already exists or data already exists
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:85) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:79) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.persist(LocalState.java:101) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:82) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:76) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at 
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this7400.invoke(supervisor.clj:382)
>  ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> ---
> java.io.FileNotFoundException: File 
> '/var/lib/storm/supervisor/localstate/1441034838231' does not exist
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) 
> ~[commons-io-2.4.jar:2.4]
> at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) 
> ~[commons-io-2.4.jar:2.4]
> at 
> backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:61) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:47) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.get(LocalState.java:72) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
> at clojure.core$partial$fn4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> -



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (STORM-1043) Concurrent access to state on local FS by multiple supervisors

2015-09-14 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743763#comment-14743763
 ] 

Erik Weathers edited comment on STORM-1043 at 9/14/15 4:18 PM:
---

As I replied in the issue that [~ernisv] raised 
(https://github.com/mesos/storm/issues/60), the solution is to leverage mesos's 
ability to put framework's data into separate sandboxes.  Just *don't* set 
{{storm.local.dir}} and the cwd of the Mesos Executor will be used for the 
Supervisor, which will be in the supervisor-specific sandbox.

FYI [~revans2], the ports are taken care of automatically by Mesos's 
scheduler/offer system, as they are considered part of the resources that each 
topology is claiming on the Mesos worker nodes ("mesos-slave" has now been 
renamed as "mesos-agent").


was (Author: erikdw):
As I replied in the issue that [~ernisv] raised (), the solution is to leverage 
mesos's ability to put framework's data into separate sandboxes.  Just *don't* 
set {storm.local.dir} and the cwd of the Mesos Executor will be used for the 
Supervisor, which will be in the supervisor-specific sandbox.

FYI [~revans2], the ports are taken care of automatically by Mesos's 
scheduler/offer system, as they are considered part of the resources that each 
topology is claiming on the Mesos worker nodes ("mesos-slave" has now been 
renamed as "mesos-agent").

> Concurrent access to state on local FS by multiple supervisors
> --
>
> Key: STORM-1043
> URL: https://issues.apache.org/jira/browse/STORM-1043
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 0.9.5
>    Reporter: Ernestas Vaiciukevičius
>Assignee: Erik Weathers
>  Labels: mesosphere
>
> Hi,
> we are running storm-mesos cluster and occassionaly workers die or are "lost" 
> in mesos. When this happens it often coincides with errors in logs related to 
> supervisors local state.
> By looking at the storm code it seems this might be caused by the way how 
> multiple supervisor processes access the local state in the same directory 
> via VersionedStore.
> For example: 
> https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L434
> Here every supervisor does this concurrently:
> 1. reads latest state from FS
> 2. possibly updates the state
> 3. writes the new version of the state
> Some updates could be lost if there are 2+ supervisors and they execute above 
> steps concurrently - then only the updates from last supervisor would remain 
> on the last state version on the disk.
> We observed local state changes quite often (seconds), so the likelihood of 
> this concurrency issue occurring is high.
> Some examples of exeptions:
> --
> java.lang.RuntimeException: Version already exists or data already exists
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:85) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:79) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.persist(LocalState.java:101) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:82) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:76) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at 
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this7400.invoke(supervisor.clj:382)
>  ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> ---
> java.io.FileNotFoundException: File 
> '/var/lib/storm/supervisor/localstate/1441034838231' does not exist
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) 
> ~[commons-io-2.4.jar:2.4]
> at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) 
> ~[commons-io-2.4.jar:2.4]
> at 
> backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:61) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:47) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.get(LocalState.java:72) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.applyToHelper(A

[jira] [Closed] (STORM-1043) Concurrent access to state on local FS by multiple supervisors

2015-09-14 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers closed STORM-1043.

Resolution: Invalid

> Concurrent access to state on local FS by multiple supervisors
> --
>
> Key: STORM-1043
> URL: https://issues.apache.org/jira/browse/STORM-1043
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 0.9.5
>Reporter: Ernestas Vaiciukevičius
>    Assignee: Erik Weathers
>  Labels: mesosphere
>
> Hi,
> we are running storm-mesos cluster and occassionaly workers die or are "lost" 
> in mesos. When this happens it often coincides with errors in logs related to 
> supervisors local state.
> By looking at the storm code it seems this might be caused by the way how 
> multiple supervisor processes access the local state in the same directory 
> via VersionedStore.
> For example: 
> https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L434
> Here every supervisor does this concurrently:
> 1. reads latest state from FS
> 2. possibly updates the state
> 3. writes the new version of the state
> Some updates could be lost if there are 2+ supervisors and they execute above 
> steps concurrently - then only the updates from last supervisor would remain 
> on the last state version on the disk.
> We observed local state changes quite often (seconds), so the likelihood of 
> this concurrency issue occurring is high.
> Some examples of exeptions:
> --
> java.lang.RuntimeException: Version already exists or data already exists
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:85) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:79) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.persist(LocalState.java:101) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:82) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:76) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at 
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this7400.invoke(supervisor.clj:382)
>  ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> ---
> java.io.FileNotFoundException: File 
> '/var/lib/storm/supervisor/localstate/1441034838231' does not exist
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) 
> ~[commons-io-2.4.jar:2.4]
> at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) 
> ~[commons-io-2.4.jar:2.4]
> at 
> backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:61) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:47) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.get(LocalState.java:72) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
> at clojure.core$partial$fn4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> -



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-1043) Concurrent access to state on local FS by multiple supervisors

2015-09-14 Thread Erik Weathers (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14743763#comment-14743763
 ] 

Erik Weathers commented on STORM-1043:
--

As I replied in the issue that [~ernisv] raised (), the solution is to leverage 
mesos's ability to put framework's data into separate sandboxes.  Just *don't* 
set {storm.local.dir} and the cwd of the Mesos Executor will be used for the 
Supervisor, which will be in the supervisor-specific sandbox.

FYI [~revans2], the ports are taken care of automatically by Mesos's 
scheduler/offer system, as they are considered part of the resources that each 
topology is claiming on the Mesos worker nodes ("mesos-slave" has now been 
renamed as "mesos-agent").

> Concurrent access to state on local FS by multiple supervisors
> --
>
> Key: STORM-1043
> URL: https://issues.apache.org/jira/browse/STORM-1043
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 0.9.5
>Reporter: Ernestas Vaiciukevičius
>  Labels: mesosphere
>
> Hi,
> we are running storm-mesos cluster and occassionaly workers die or are "lost" 
> in mesos. When this happens it often coincides with errors in logs related to 
> supervisors local state.
> By looking at the storm code it seems this might be caused by the way how 
> multiple supervisor processes access the local state in the same directory 
> via VersionedStore.
> For example: 
> https://github.com/apache/storm/blob/master/storm-core/src/clj/backtype/storm/daemon/supervisor.clj#L434
> Here every supervisor does this concurrently:
> 1. reads latest state from FS
> 2. possibly updates the state
> 3. writes the new version of the state
> Some updates could be lost if there are 2+ supervisors and they execute above 
> steps concurrently - then only the updates from last supervisor would remain 
> on the last state version on the disk.
> We observed local state changes quite often (seconds), so the likelihood of 
> this concurrency issue occurring is high.
> Some examples of exeptions:
> --
> java.lang.RuntimeException: Version already exists or data already exists
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:85) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.VersionedStore.createVersion(VersionedStore.java:79) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.persist(LocalState.java:101) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:82) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.put(LocalState.java:76) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at 
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this7400.invoke(supervisor.clj:382)
>  ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> ---
> java.io.FileNotFoundException: File 
> '/var/lib/storm/supervisor/localstate/1441034838231' does not exist
> at org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:299) 
> ~[commons-io-2.4.jar:2.4]
> at org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1763) 
> ~[commons-io-2.4.jar:2.4]
> at 
> backtype.storm.utils.LocalState.deserializeLatestVersion(LocalState.java:61) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.snapshot(LocalState.java:47) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.utils.LocalState.get(LocalState.java:72) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.daemon.supervisor$sync_processes.invoke(supervisor.clj:234) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.applyToHelper(AFn.java:161) [clojure-1.5.1.jar:na]
> at clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]
> at clojure.core$apply.invoke(core.clj:619) ~[clojure-1.5.1.jar:na]
> at clojure.core$partial$fn4190.doInvoke(core.clj:2396) ~[clojure-1.5.1.jar:na]
> at clojure.lang.RestFn.invoke(RestFn.java:397) ~[clojure-1.5.1.jar:na]
> at backtype.storm.event$event_manager$fn2625.invoke(event.clj:40) 
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> -



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


worker-launcher - there is now native code included in Storm?

2015-09-14 Thread Erik Weathers
It seems that in storm 0.10 we have native code (C) that is being used for
launching workers.  I haven't found any reference to how this might affect
deployment, testing, etc.  Is there any documentation about this new
requirement?  I know it's related to the "launch storm under different
usernames" feature.  But is the native stuff optional?

- Erik


Storm multi-tenancy in 0.10 - is there any topology resource isolation?

2015-09-14 Thread Erik Weathers
hi Storm Devs,

After reading the release notes for 0.10-beta, I was unsure if the
multi-tenancy feature will isolate topologies from one another in terms of
their process resources (CPU, memory):

   - https://storm.apache.org/2015/06/15/storm0100-beta-released.html
   - Note: there is a reference to: "configurable resource limits."

To understand how this is implemented, I tried searching through the code
(both 0.10.x-branch and master), but couldn't find any reference to
"cgroup".  So how is the resource isolation being done?

Maybe this release note is referencing the "R-storm" work?  That still
isn't isolating/limiting resources as far as I have been able to discern,
it's instead focused on scheduling.  Also it doesn't seem to be fixed in
0.10 already, so I'm doubtful of it being the answer to my question.

   - http://web.engr.illinois.edu/~bpeng/files/r-storm.pdf
   - https://issues.apache.org/jira/browse/STORM-893

Thanks!

- Erik


Re: Error when build Storm source code with Maven

2015-03-31 Thread Erik Weathers
hi Sig.  Can you please clarify which version of storm you're building?
 i.e., where did you clone the repo from and which git tree-ish you have
built from (commit SHA, tag, tree/branch)?

- Erik

On Tue, Mar 31, 2015 at 6:47 PM, Sigmund Lee wua...@gmail.com wrote:

 Hi all,

 I encounter following error when build storm source code with maven (mvn
 clean install -DskipTests -e):

 [INFO] --- maven-resources-plugin:2.5:testResources (default-testResources)
  @ storm-hbase ---
 
  [debug] execute contextualize
 
  [INFO] Using 'UTF-8' encoding to copy filtered resources.
 
  [INFO] skip non existing resourceDirectory
  /home/lee/Opensource/storm/external/storm-hbase/src/test/resources
 
  [INFO] Copying 3 resources
 
  [INFO]
 
  [INFO] --- maven-compiler-plugin:3.1:testCompile (default-testCompile) @
  storm-hbase ---
 
  [INFO] Changes detected - recompiling the module!
 
  [INFO] Compiling 9 source files to
  /home/lee/Opensource/storm/external/storm-hbase/target/test-classes
 
  [WARNING]
 
 /home/lee/Opensource/storm/external/storm-hbase/src/test/java/org/apache/storm/hbase/trident/WordCountTrident.java:
 
 /home/lee/Opensource/storm/external/storm-hbase/src/test/java/org/apache/storm/hbase/trident/WordCountTrident.java
  uses unchecked or unsafe operations.
 
  [WARNING]
 
 /home/lee/Opensource/storm/external/storm-hbase/src/test/java/org/apache/storm/hbase/trident/WordCountTrident.java:
  Recompile with -Xlint:unchecked for details.
 
  [INFO]
 
  [INFO] --- maven-surefire-plugin:2.9:test (default-test) @ storm-hbase
 ---
 
  [INFO] Tests are skipped.
 
  [INFO]
 
  [INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ storm-hbase ---
 
  [INFO] Building jar:
 
 /home/lee/Opensource/storm/external/storm-hbase/target/storm-hbase-0.11.0-SNAPSHOT.jar
 
  [INFO]
 
  [INFO] --- maven-site-plugin:3.3:attach-descriptor (attach-descriptor) @
  storm-hbase ---
 
  [INFO]
 
 
  [INFO]
  
 
  [INFO] Building storm-hive 0.11.0-SNAPSHOT
 
  [INFO]
  
 
  [WARNING] The POM for
  org.apache.calcite:calcite-core:jar:0.9.2-incubating-SNAPSHOT is
 missing,
  no dependency information available
 
  [WARNING] The POM for
  org.apache.calcite:calcite-avatica:jar:0.9.2-incubating-SNAPSHOT is
  missing, no dependency information available
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/eigenbase/eigenbase-properties/1.1.4/eigenbase-properties-1.1.4.pom
 
  [WARNING] The POM for eigenbase:eigenbase-properties:jar:1.1.4 is
 missing,
  no dependency information available
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/net/hydromatic/linq4j/0.4/linq4j-0.4.pom
 
  [WARNING] The POM for net.hydromatic:linq4j:jar:0.4 is missing, no
  dependency information available
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/net/hydromatic/quidem/0.1.1/quidem-0.1.1.pom
 
  [WARNING] The POM for net.hydromatic:quidem:jar:0.1.1 is missing, no
  dependency information available
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/org/pentaho/pentaho-aggdesigner-algorithm/5.1.3-jhyde/pentaho-aggdesigner-algorithm-5.1.3-jhyde.pom
 
  [WARNING] The POM for
  org.pentaho:pentaho-aggdesigner-algorithm:jar:5.1.3-jhyde is missing, no
  dependency information available
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/org/pentaho/pentaho-aggdesigner-algorithm/5.1.3-jhyde/pentaho-aggdesigner-algorithm-5.1.3-jhyde.jar
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/net/hydromatic/linq4j/0.4/linq4j-0.4.jar
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/eigenbase/eigenbase-properties/1.1.4/eigenbase-properties-1.1.4.jar
 
  Downloading:
 
 http://localhost:8081/nexus/content/groups/public/net/hydromatic/quidem/0.1.1/quidem-0.1.1.jar
 
  [INFO]
  
 
  [INFO] Reactor Summary:
 
  [INFO]
 
  [INFO] Storm .. SUCCESS [
   2.090 s]
 
  [INFO] maven-shade-clojure-transformer  SUCCESS [
   2.411 s]
 
  [INFO] storm-maven-plugins  SUCCESS [
   3.067 s]
 
  [INFO] Storm Core . SUCCESS
 [03:02
  min]
 
  [INFO] storm-starter .. SUCCESS [
   8.200 s]
 
  [INFO] storm-kafka  SUCCESS [
   0.841 s]
 
  [INFO] storm-hdfs . SUCCESS [
   3.001 s]
 
  [INFO] storm-hbase  SUCCESS [
   4.347 s]
 
  [INFO] storm-hive . FAILURE [
   1.300 s]
 
  [INFO] storm-jdbc . SKIPPED
 
  [INFO] storm-redis  SKIPPED
 
  [INFO]
  

[jira] [Updated] (STORM-188) Allow user to specify full configuration path when running storm command

2015-03-31 Thread Erik Weathers (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Weathers updated STORM-188:

Summary: Allow user to specify full configuration path when running storm 
command  (was: Allow user to specifiy full configuration path when running 
storm command)

 Allow user to specify full configuration path when running storm command
 

 Key: STORM-188
 URL: https://issues.apache.org/jira/browse/STORM-188
 Project: Apache Storm
  Issue Type: Bug
Reporter: Sean Zhong
Priority: Minor
 Attachments: search_local_path_for_config.patch, storm-188.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Currently, storm will only look up configuration path in java classpath. We 
 should also allow user to specify full configuration path. This is very 
 important for a shared cluster environment, like YARN. Multiple storm cluster 
 may runs with different configuration, but share same binary folder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: begging for a bit of help reading storm-core code

2015-03-25 Thread Erik Weathers
Thanks for the response Bobby, on both fronts.

Can you please confirm my new understanding of this potentially misleading
comment above *mk-storm-cluster-state?*

https://github.com/apache/storm/blob/v0.9.3/storm-core/src/clj/backtype/storm/cluster.clj#L233

;; Watches should be used for optimization. When ZK is reconnecting,
they're not guaranteed to be called.

This comment sounds like a TODO, but based on your response I assume the
code is already using ZooKeeper watches.  So now I believe that this
comment is instead warning potential callers that these watch-based
callbacks aren't reliable.  Is that correct?

Thanks again!

- Erik

On Tue, Mar 24, 2015 at 6:49 AM, Bobby Evans ev...@yahoo-inc.com.invalid
wrote:

 Two things. First storm uses zookeeper to store state and offers a watch
 that can inform a client when a particular znode in zookeeper changes.  The
 callbacks provide a way for a client to be informed that the particular
 part of the state they care about has changed, or nil if you don't want
 it.  The ZK functionality is not always perfect in getting the callbacks,
 so it is always a good idea to back up the callback with polling
 periodically.  Also at least for people I know working on DRUID the order
 of the callbacks on certain versions of ZK can be out of order, so if there
 are a lot of changes happening to a single ZNode you may need to be
 careful, luckily this is not an issue for how storm currently uses ZK.

 Now for the second thing.  JStorm is a fork of Apache Storm where the
 clojure code was translated into java code, so if you are having trouble
 reading the clojure code you can look at what JStorm is doing for a hint.
 Be aware that the two projects have diverged somewhat so they are not going
 to be identical in their functionality.
 That being said both projects have been talking with one another about
 combining.  https://issues.apache.org/jira/browse/STORM-717 is a JIRA to
 work through making that happen.  If you or anyone else has an opinion on
 this please feel free to discuss it on that JIRA on here on the dev list,
 I'll send out another e-mail to provide a better place than piggybacking it
 here.  The goal would be to maintain binary compatibility, except possibly
 in the case of dependencies.  It would be great to have both groups working
 together instead of duplication of effort.  We also felt that having more
 of the code base in java would possibly make it more accessible to a wider
 range of developers.  That being said this project is a community effort so
 anyone who wants to help, or has opinions on this please help out/let us
 know what you think.   - Bobby



  On Tuesday, March 24, 2015 12:51 AM, Erik Weathers 
 eweath...@groupon.com wrote:


  hi Longda, thanks for the response.  Interesting project there.  I would
 appreciate seeing the architecture/work-flow file if you could please share
 it.
 However, we are using standard Storm, and I need to continue supporting our
 existing system -- which requires me to dig into the Clojure-based storm
 code.

 Any other kind passersby that can help me?

 - Erik

 On Monday, March 23, 2015, 封仲淹(纪君祥) zhongyan.f...@alibaba-inc.com wrote:

 
  You can read the source code of Jstorm, It is java Storm. It is easy to
  read
  https://github.com/alibaba/jstorm
  I can share one freemind file to you to demonstrate the architecture and
  work flow.
 
 
 
  Best Regards
  Longda
 
 
  -邮件原件-
  发件人: Erik Weathers [mailto:eweath...@groupon.com]
  发送时间: 2015年3月24日 8:58
  收件人: dev@storm.apache.org
  主题: begging for a bit of help reading storm-core code
 
  hi Storm Devs!
 
  I've been trying to discern what the purpose is for the ubiquitous
  callback variables in cluster.clj and supervisor.clj code.  It occurred
  to me that some of you fine folks on the dev list must have already
 braved
  this test and could help dispel my ignorance.  Specifically, I'm trying
 to
  determine what the mk-storm-cluster-state function is actually doing, and
  understanding the callbacks seems critical to that endeavor.
 
  I had hoped that commit log messages might help me infer the point of
  these callbacks, but this particular pattern hasn't changed since the
  initial commit by Nathan Marz.
 
  Thanks for whatever help you can provide!
 
  - Erik
 
  P.S., I have many other similar questions about the storm-core Clojure
  code, so I wonder if someone might have a suggestion of a different route
  for to requesting such code-reading help?
 
 





Re: begging for a bit of help reading storm-core code

2015-03-23 Thread Erik Weathers
hi Longda, thanks for the response.  Interesting project there.  I would
appreciate seeing the architecture/work-flow file if you could please share
it.
However, we are using standard Storm, and I need to continue supporting our
existing system -- which requires me to dig into the Clojure-based storm
code.

Any other kind passersby that can help me?

- Erik

On Monday, March 23, 2015, 封仲淹(纪君祥) zhongyan.f...@alibaba-inc.com wrote:


 You can read the source code of Jstorm, It is java Storm. It is easy to
 read
 https://github.com/alibaba/jstorm
 I can share one freemind file to you to demonstrate the architecture and
 work flow.



 Best Regards
 Longda


 -邮件原件-
 发件人: Erik Weathers [mailto:eweath...@groupon.com]
 发送时间: 2015年3月24日 8:58
 收件人: dev@storm.apache.org
 主题: begging for a bit of help reading storm-core code

 hi Storm Devs!

 I've been trying to discern what the purpose is for the ubiquitous
 callback variables in cluster.clj and supervisor.clj code.  It occurred
 to me that some of you fine folks on the dev list must have already braved
 this test and could help dispel my ignorance.  Specifically, I'm trying to
 determine what the mk-storm-cluster-state function is actually doing, and
 understanding the callbacks seems critical to that endeavor.

 I had hoped that commit log messages might help me infer the point of
 these callbacks, but this particular pattern hasn't changed since the
 initial commit by Nathan Marz.

 Thanks for whatever help you can provide!

 - Erik

 P.S., I have many other similar questions about the storm-core Clojure
 code, so I wonder if someone might have a suggestion of a different route
 for to requesting such code-reading help?




begging for a bit of help reading storm-core code

2015-03-23 Thread Erik Weathers
hi Storm Devs!

I've been trying to discern what the purpose is for the ubiquitous
callback variables in cluster.clj and supervisor.clj code.  It occurred
to me that some of you fine folks on the dev list must have already braved
this test and could help dispel my ignorance.  Specifically, I'm trying to
determine what the mk-storm-cluster-state function is actually doing, and
understanding the callbacks seems critical to that endeavor.

I had hoped that commit log messages might help me infer the point of these
callbacks, but this particular pattern hasn't changed since the initial
commit by Nathan Marz.

Thanks for whatever help you can provide!

- Erik

P.S., I have many other similar questions about the storm-core Clojure
code, so I wonder if someone might have a suggestion of a different route
for to requesting such code-reading help?