from:"Matthias J. Sax"

Re: New Committer/PMC Member: Stig Rohde Døssing

2017-07-15 Thread Matthias J. Sax

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 7/14/17 6:17 PM, Xin Wang wrote:
> Congrats Stig!
> 
> - Xin
> 
> 2017-07-15 9:13 GMT+08:00 Satish Duggana
> :
> 
>> Congrats Stig!
>> 
>> ~Satish.
>> 
>> On Sat, Jul 15, 2017 at 4:18 AM, Jungtaek Lim 
>> wrote:
>> 
>>> Welcome Stig! Well deserved.
>>> 
>>> - Jungtaek Lim (HeartSaVioR) On Sat, 15 Jul 2017 at 4:25 AM
>>> Hugo Da Cruz Louro <
>> hlo...@hortonworks.com>
>>> wrote:
>>> 
 Welcome Stig. Looking forward to collaborating with you.
 
 Hugo
 
> On Jul 14, 2017, at 11:44 AM, P. Taylor Goetz
> 
>>> wrote:
> 
> Please join me in congratulating the latest Committer and
> PMC Member,
 Stig Rohde Døssing.
> 
> Stig has been very active contributing patches to Storm’s
> Kafka
 integration, participating in code reviews, and answering
 questions on
>>> the
 mailing lists.
> 
> Welcome Stig!
> 
> -Taylor
> 
 
 
>>> 
>> 
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJZarriAAoJELz8Z8hxAGOi5s4P3j/XBh45/nCrpuydt5CB9pEQ
/VtGB/DZ1HfJiBE0E5TgzEkuktt5A1UBf2XbNtmUZ4v7Tpjb+Cq/o9KbPWLKacp8
5bDZxyKIlIqZKK11Tj0bNqx1F3aQuMMUtFowU6oVum2OZPBsCHJD2a7H9nG1qEaP
YorJCyq3k3EpakZjO70DPX3igM4Aj4d17FyGlHqU+R8NtoDrktTLeKPoakcX2ZHO
7nXecowDPaX2y9hGrWGhI4nYfYFhNHhjQE1KXqVOmLN9vnQjPYl8YdqSANiVMZwy
XWK1R0KWj1fifXoUZCFlrYk14XBpWotDkPQfMbxCaZoy4vfCsakLQDyA7JUa+ksz
sitjvB/E8ZEWX9jvfP8ruA0qaQFBwzIzdH4szXL555xJQjYlr1Tes2q6j6yaa7nu
8C/flDifHEnnqEqASiTfAyn9i4Oss02/TWq1m87GckGB9WY927hA4CaSiKPg8nLz
Q/SPERXH81GwNLQrQ2Q67DcRZU5QuFZ1/+yoL93Rz/qIQDBXm/KYg9OC1E0ineOe
r4tD0LmVbA6HAMSATJJ0gX8gJSjUpWNKfAA9jTlz8Upk1zZQSi62BAKYgpCJZhYn
7uFivFkz0B78Wq33IUBrwcIgS20RwNTwQPOatm9O3Ut8n8Y2HlakftLDDg8mWSkH
8wdWg3UcVODgzQA=
=qJd+
-END PGP SIGNATURE-

Re: New Committer/PMC Member: Hugo Louro

2017-03-23 Thread Matthias J. Sax

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats Hugo!

On 3/23/17 9:32 AM, P. Taylor Goetz wrote:
> Activity in another email thread reminded me of something I forgot 
> to do…
> 
> The Apache Storm PMC has voted to add Hugo Louro as a Committer
> and PMC Member.
> 
> Please join me in congratulating Hugo for his new role!
> 
> (And my apologies for not sending out the announcement sooner. ;) 
> )
> 
> -Taylor
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJY1BOeAAoJELz8Z8hxAGOiKYcP4KIUChgXfViW6Uq8qtBCKnlA
dPRX2UbDFZnOpyRwnRC1S+tId172bWEhjD+bntAV/br4u5lWEpdPzN+KUaU101Dr
bGIbCxWWl3jXOPxcGl54RaUuPfU9iu2MpYNhYnYIKVuiOXPoHstGCZtCv56Va0de
+eg8CzGcG/+xpmQtFYFvq2LUmccmrWDZ6/X6xWh1kicXwpPMbnEAVvLAIavYE3jT
64UFo3tjZtx0NbjqvHFuFNKy6gbwum9k3kxgMQhCCBbQfvd/AEji0+2f6Ltys/Xk
nnw9+YW0cXKkRFehg7sTSf9+Am44VHjOi5ZiKbKaNjlqqCIKC7bYzhl/jOOMONGw
rGTluu5zHgRIoFqpBysHIuvkNyzyTtO64LywsjFRMw8WuphQfXtSJbhxBeeKJiHa
gw8ito5yCHDF09zAIvkPRwoZZ/4M5PxSKloH4F2rMQ6+EMATu9kFthPu9C9IvDHJ
vEF+3yp+BSBx9Y72zbgGfH/izMnxf2b3uHEg94SvbY4A3/6YaivZrDmvPe86tB+w
ro3UI2yjnONwAxQE4DYKN1OxrL8xBCKZn0h1cC/+hOh5B6kKY3s7tGTtKAwAVWip
zcsVNNK7P4m9CqcrHPxauQySFBhbXO2XZbwZa/0SUoPyBtnxF7h2YMw++nBKUx+x
7adXDhh4ZtshR0w=
=mcXK
-END PGP SIGNATURE-

Re: [ANNOUNCE] New Storm Commiter/PMC Member: Xin Wang

2016-11-30 Thread Matthias J. Sax

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Congrats!

On 11/30/16 8:43 PM, Jungtaek Lim wrote:
> Please join me in welcoming Xin Wang as a new Apache Storm
> Committer and PMC member.
> 
> Xin Wang has showed strong commitment to the Apache Storm community
> via code contributions for a long time spanning both core and as
> well as integration components. He also has showed active
> participation on mailing lists.
> 
> Congratulations and welcome Xin!
> 
> - Jungtaek Lim (HeartSaVioR)
> 
-BEGIN PGP SIGNATURE-
Comment: GPGTools - https://gpgtools.org

iQIYBAEBCgAGBQJYP7P1AAoJELz8Z8hxAGOi3egP3RcjYhi9Bof+Y/T30Sy2+TCH
ccBRcYxZcm2zRGZoLgAFO4HolBEaH0NyEhcj26lLzwl59zLrBWlpfxhSf0ruPW/X
Dg5Lm6dtY9LkweUePSgxh7SxnSL0wxh+5ID0Yyv4YuB0p9IE4D9+YM0ziKj+moFd
nlUICUd+eXzbro8oaa28Gi8D28v97N0vBQaZq9ATyzFkbpm9VTl3rEGw91q/b04n
kQfELba51TWG4gQGlsALAIBSuHhhJ9wVbFCGnJVLF4t47jS7gDYOZi7TqXOpPDU3
FS4VhYQUBNPpHgWseeetCSvYe8dGht2+lLYzlj37UDjfS22nGRHRMV3Jh+nQKRW2
CuDFxlZzH3sxHtvR4d7i5axP2cbFm1sqsn/UcmJsTq17jJBM+vZvv4MmxObS76yn
ZHo2i7mvuvkGfmgoByzuDaJb/eo30yCJ0+TC4nYaZaI+cLr3hHRRyI2WejtXCI3r
8vrfHt3dXbbwLwFMRkckvdELqxnsxVI5oFbswPl+0H0SChI33rtL/L/lwvzqp3df
HHb75AJDHjFwHTvAakhuXACuQ/+y3ZezrKfhmjf4mWlYor5mNwMtMEMeH11LUDaV
2CRrPVZ13GPJ3dyhgA/nCYFvsm9K5EINOpMW08saZg/yryK6ExoQDpJa8y8F6E4E
CoiIwY2RoW1P/T4=
=nTHE
-END PGP SIGNATURE-

Re: Too many machine mails

2016-08-12 Thread Matthias J. Sax

 I think we can talk to INFRA about adopting/supporting those
>>> changes. - Bobby
>>>
>>> On Thursday, August 11, 2016 8:41 AM, Aditya Desai
>>> <adity...@usc.edu <mailto:adity...@usc.edu>>
>>> wrote:
>>>
>>>
>>>   Please reduce the number of emails. I am getting many many emails in
>>> recent
>>> days and spam my inbox.
>>>
>>> On Thu, Aug 11, 2016 at 2:41 AM, Erik Weathers <
>>> eweath...@groupon.com.invalid <mailto:eweath...@groupon.com.invalid>>
>>> wrote:
>>>
>>>> I will state again (as I've done on prior email threads) that I find no
>>>> value in spamming the JIRA issues like this, and that I strongly believe
>>>> that this behavior is in fact detrimental since it obscures the actual
>>>> comments on the issue itself.  The proposed solution of just moving the
>>>> destination of the JIRA emails to a different list than
>>>> dev@storm.apache.org <mailto:dev@storm.apache.org>
>>>> doesn't solve that root problem.
>>>>
>>>> I want to be able to read a JIRA issue without having to skim over
>>>> dozens
>>>> and dozens of auto-appended code review messages.  I truly cannot
>>>> understand why this isn't an annoyance for others.  I could be really
>>>> snarky and reformat this email to have a bunch of random stuff in
>>>> between
>>>> every sentence to make my point, but I hope this sentence suffices to
>>> prove
>>>> it?
>>>>
>>>> Though I must acknowledge your point Jungtaek  that there is some Apache
>>>> policy that all code review comments need to be archived into some
>>>> apache
>>>> system.  Maybe we can use the attachment functionality of JIRA
>>>> instead of
>>>> making these separate comments on the JIRA issue?  I'm not sure how the
>>>> integration is set up right now, that seems feasible.
>>>>
>>>> - Erik
>>>>
>>>> On Thu, Aug 11, 2016 at 2:08 AM, Matthias J. Sax <mj...@apache.org
>>>> <mailto:mj...@apache.org>>
>>> wrote:
>>>>
>>>>> I like the idea of have one more mailing list to reduce load on
>>> dev-list.
>>>>>
>>>>> -Matthias
>>>>>
>>>>> On 08/11/2016 11:07 AM, Jungtaek Lim wrote:
>>>>>> I remember that Taylor stated that all github comments should be
>>> copied
>>>>> to
>>>>>> somewhere Apache infra, and it's Apache JIRA for us.
>>>>>>
>>>>>> It seems to make sense but I'm curious other projects respect this
>>>> rule.
>>>>> I
>>>>>> also subscribed dev list of Kafka, Zeppelin, Flink, HBase, Spark
>>>>> (although
>>>>>> I barely see them) but no project is sending mail per each comment.
>>>> Some
>>>>> of
>>>>>> them copy github comments to JIRA issue but no notification, and
>>> others
>>>>>> doesn't even copy comments to JIRA issue.
>>>>>> (You can check this with dev mailing list archive, too.)
>>>>>>
>>>>>> I'm in favor of reducing simple notification mails. Personally I saw
>>>> most
>>>>>> of Storm dev. mails so I'm fine to keep mailing as it is (with some
>>>>>> annoying 'empty' notification), but it can also be done with watching
>>>>>> Github project.
>>>>>>
>>>>>> This is not raised for the first time, and I would like to discuss
>>>>>> seriously and see the changes.
>>>>>>
>>>>>> Thanks,
>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> 2016년 8월 11일 (목) 오후 2:22, Kyle Nusbaum
>>>>>> <knusb...@yahoo-inc.com.invali <mailto:knusb...@yahoo-inc.com.invali>
>>>>> d>님이
>>>>>> 작성:
>>>>>>
>>>>>>> There seems to be a surplus of automatically-generated emails on the
>>>> dev
>>>>>>> mailing list.
>>>>>>> Github and Apache's Jira constantly send mails to the dev list.
>>>>>>>
>>>>>>> I'm not sure that anyone finds these useful. Even if they do, I
>>> wonder
>>>>> if
>>>>>>> its better to move them to a separate list. It's possible that
>>>> everyone
>>>>> has
>>>>>>> email filters employed to sort this out, but if every subscriber has
>>>> the
>>>>>>> same filters employed, it might indicate the need for a separate
>>> list.
>>>>> --
>>>>>>> Kyle
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Aditya Ramachandra Desai
>>> MS Computer Science Graduate Student
>>> USC Viterbi School of Engineering
>>> Los Angeles, CA 90007
>>> M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai
>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: Too many machine mails

2016-08-12 Thread Matthias J. Sax

+1

On 08/12/2016 04:58 AM, Satish Duggana wrote:
> +1 on that proposal.  IMHO, that should be sufficient to address the pain 
> points. 
> 
> Thanks,
> Satish.
> 
> On 8/12/16, 6:26 AM, "P. Taylor Goetz" <ptgo...@gmail.com> wrote:
> 
> The idea behind github comments going to the JIRA work log is that it 
> would not trigger an email notification in the way that comments do. To view 
> github activity in a JIRA, you would have to click on the work log tab.
> 
> You would only get github emails according your github watch settings. So 
> JIRA would NOT notify you of github activity.
> 
> The second part (point #2 in my earlier email) is that all JIRA 
> notifications would go to issues@ instead of dev@. That way dev@ would be 
> strictly reserved for humans, and issues@ would be for the machines.
> 
> That's my understanding of how it would work. And that approach would 
> seemingly address the pian points we've pointed out.
> 
> I'm +1 for at least trying it out. If it's worse than what we have now we 
> can always revert or try something new.
>     
> -Taylor
> 
> 
> 
> > On Aug 11, 2016, at 8:01 PM, Matthias J. Sax <mj...@apache.org> wrote:
> > 
> > On more thing to add. It seems that for Kafka, Github PR comments are
> > not mirrored to Jira. Only opening and closing a PR adds a JIRA comment.
> > 
> > This can be configured by Infra team AFAIK. I guess it would help to
> > reduce duplicate mails.
> > 
> > 
> > -Matthias
> > 
> >> On 08/12/2016 01:19 AM, Jungtaek Lim wrote:
> >> Forget one thing, if separating list can keep active discussions and 
> votes
> >> on top page, it would be valuable for me.
> >> 
> >> 2016년 8월 12일 (금) 오전 8:16, Jungtaek Lim <kabh...@gmail.com>님이 작성:
> >> 
> >>> IMHO, all committers/PMCs need to be noticed for JIRA and Github
> >>> activities.
> >>> (Sometimes pull requests are submitted without having associated JIRA
> >>> issue as I did it yesterday. Actually I don't strictly think
> >>> committers/PMCs need to be noticed for all of pull request comments, 
> but at
> >>> least need to be noticed for open/close pull request activities.)
> >>> 
> >>> So separating list is only for contributors which would like to hear
> >>> about community activities but don't want to see the details (issue 
> level),
> >>> and reducing duplication should be handled even though we can move 
> JIRA /
> >>> Github activities out of dev@ list.
> >>> 
> >>> I assume that adding JIRA work log doesn't send notification. If it 
> is,
> >>> that's enough for me. Separating list is optional.
> >>> 
> >>> - Jungtaek Lim (HeartSaVioR)
> >>> 
> >>> 
> >>> 
> >>> 2016년 8월 12일 (금) 오전 6:20, Kyle Nusbaum 
> <knusb...@yahoo-inc.com.invalid>님이
> >>> 작성:
> >>> 
> >>>> If this other mailing list gets notified of all github activity (all
> >>>> comments, etc.), is that sufficient for being "archived" on ASF 
> hardware?
> >>>> I'm assuming the ASF is hosting their own mail servers.
> >>>> I'd much rather have all github activity go to a mailing list for 
> archive
> >>>> than go to jira and end up in the mailing list 4 times anyway.
> >>>> -- Kyle
> >>>> 
> >>>>On Thursday, August 11, 2016 3:54 PM, P. Taylor Goetz <
> >>>> ptgo...@gmail.com> wrote:
> >>>> 
> >>>> 
> >>>> We don’t need a formal vote if we have a general consensus.
> >>>> This is an issue I’d like to see fixed since it drives me nuts and is
> >>>> amplified by the number of mailing lists I’m subscribed to.
> >>>> The requirement to link github pull request comments, etc. to JIRA
> >>>> originates from the ASF policy that all artifacts of the 
> decision-making
> >>>> process (email, issues, etc.) be archived on ASF-controlled 
> hardware. The
> >>>> linking of github activity to JIRA partly addressed that, but it’s 
> not
> >>>> optimal (e.g. What happens when a PR isn’t linked to a JIRA?).
> >>>> Personally, I want to get no

Re: Too many machine mails

2016-08-11 Thread Matthias J. Sax

omething to actually get this
>>> moving? I'm not sure what the procedure is for setting up mailing lists.
>>>  -- Kyle
>>>
>>> On Thursday, August 11, 2016 9:18 AM, Jungtaek Lim <kabh...@gmail.com>
>>> wrote:
>>>
>>>
>>>  First of all we need to define which things are annoying. Belows are some
>>> which are mentioned one or more people,
>>>
>>> 1. Duplicated notifications per comment (You can receive 2 mails from dev@
>>> + 1 mails from github up to condition (you're an author, you're watching,
>>> you're mentioned, etc) + occasionally 1 empty change mail from dev -> up
>>> to
>>> 4 mails)
>>> 2. Copied comments from JIRA issue (with or without notification)
>>>
>>> and also need to define which things should be notified
>>>
>>> a. open pull request and close pull request
>>> b. only open pull request (linking github pull request and notified by
>>> changing status of issue - we can have 'patch available' status for that)
>>> c. no we should receive all of comments (just need to reduce duplicated
>>> things)
>>>
>>> - Jungtaek Lim (HeartSaVioR)
>>>
>>>
>>> 2016년 8월 11일 (목) 오후 10:52, Bobby Evans <ev...@yahoo-inc.com.invalid>님이
>>> 작성:
>>>
>>>
>>> Yes lets have a separate firehouse/commit/whatever mailing list that if
>>> people really want all of that data they can see it all.  That way it is
>>> archived in ASF infra.  I do see value in having JIRA and GITHUB linked,
>>> I'm not sure if there is a better way to link the two right now though.
>>> If
>>> someone does have experience with this type of thing and can make a better
>>> solution I think we can talk to INFRA about adopting/supporting those
>>> changes. - Bobby
>>>
>>> On Thursday, August 11, 2016 8:41 AM, Aditya Desai <adity...@usc.edu>
>>> wrote:
>>>
>>>
>>>   Please reduce the number of emails. I am getting many many emails in
>>> recent
>>> days and spam my inbox.
>>>
>>> On Thu, Aug 11, 2016 at 2:41 AM, Erik Weathers <
>>> eweath...@groupon.com.invalid> wrote:
>>>
>>>
>>> I will state again (as I've done on prior email threads) that I find no
>>> value in spamming the JIRA issues like this, and that I strongly believe
>>> that this behavior is in fact detrimental since it obscures the actual
>>> comments on the issue itself.  The proposed solution of just moving the
>>> destination of the JIRA emails to a different list than
>>> dev@storm.apache.org
>>> doesn't solve that root problem.
>>>
>>> I want to be able to read a JIRA issue without having to skim over dozens
>>> and dozens of auto-appended code review messages.  I truly cannot
>>> understand why this isn't an annoyance for others.  I could be really
>>> snarky and reformat this email to have a bunch of random stuff in between
>>> every sentence to make my point, but I hope this sentence suffices to
>>>
>>> prove
>>>
>>> it?
>>>
>>> Though I must acknowledge your point Jungtaek  that there is some Apache
>>> policy that all code review comments need to be archived into some apache
>>> system.  Maybe we can use the attachment functionality of JIRA instead of
>>> making these separate comments on the JIRA issue?  I'm not sure how the
>>> integration is set up right now, that seems feasible.
>>>
>>> - Erik
>>>
>>> On Thu, Aug 11, 2016 at 2:08 AM, Matthias J. Sax <mj...@apache.org>
>>>
>>> wrote:
>>>
>>>
>>>
>>> I like the idea of have one more mailing list to reduce load on
>>>
>>>
>>> dev-list.
>>>
>>>
>>>
>>> -Matthias
>>>
>>> On 08/11/2016 11:07 AM, Jungtaek Lim wrote:
>>>
>>> I remember that Taylor stated that all github comments should be
>>>
>>>
>>>
>>> copied
>>>
>>>
>>> to
>>>
>>> somewhere Apache infra, and it's Apache JIRA for us.
>>>
>>> It seems to make sense but I'm curious other projects respect this
>>>
>>>
>>> rule.
>>>
>>> I
>>>
>>> also subscribed dev list of Kafka, Zeppelin, Flink, HBase, Spark
>>>
>>> (although
>>>
>>> I barely see t

Re: Too many machine mails

2016-08-11 Thread Matthias J. Sax

I like the idea of have one more mailing list to reduce load on dev-list.

-Matthias

On 08/11/2016 11:07 AM, Jungtaek Lim wrote:
> I remember that Taylor stated that all github comments should be copied to
> somewhere Apache infra, and it's Apache JIRA for us.
> 
> It seems to make sense but I'm curious other projects respect this rule. I
> also subscribed dev list of Kafka, Zeppelin, Flink, HBase, Spark (although
> I barely see them) but no project is sending mail per each comment. Some of
> them copy github comments to JIRA issue but no notification, and others
> doesn't even copy comments to JIRA issue.
> (You can check this with dev mailing list archive, too.)
> 
> I'm in favor of reducing simple notification mails. Personally I saw most
> of Storm dev. mails so I'm fine to keep mailing as it is (with some
> annoying 'empty' notification), but it can also be done with watching
> Github project.
> 
> This is not raised for the first time, and I would like to discuss
> seriously and see the changes.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2016년 8월 11일 (목) 오후 2:22, Kyle Nusbaum 님이
> 작성:
> 
>> There seems to be a surplus of automatically-generated emails on the dev
>> mailing list.
>> Github and Apache's Jira constantly send mails to the dev list.
>>
>> I'm not sure that anyone finds these useful. Even if they do, I wonder if
>> its better to move them to a separate list. It's possible that everyone has
>> email filters employed to sort this out, but if every subscriber has the
>> same filters employed, it might indicate the need for a separate list. --
>> Kyle
> 



signature.asc
Description: OpenPGP digital signature

Re: New Storm Commiter/PMC Member: Satish Duggana

2016-08-09 Thread Matthias J. Sax

Congrats Satish!

-Matthias

On 08/09/2016 10:21 PM, P. Taylor Goetz wrote:
> Please join me in welcoming Satish Duggana as a new Apache Storm Committer 
> and PMC member.
> 
> Satish has demonstrated a strong commitment to the Apache Storm community 
> through active participation and mentoring on the Storm mailing lists. 
> Furthermore, he has authored many enhancements and bug fixes spanning both 
> Storm’s core codebase, as well as a numerous integration components.
> 
> Congratulations and welcome Satish!
> 
> -Taylor
> 



signature.asc
Description: OpenPGP digital signature

Re: Regarding Storm - Baby steps Info

2016-08-09 Thread Matthias J. Sax

Hi Aditya,

welcome to Storm community :)

If you are a complete newbie, I would recommend to first write a couple
of Storm topologies using low level API as well as Trident. Also get
familiar with cluster setup etc.

Read the documentation from the website and some blog posts. I can
recommend http://www.michael-noll.com/ (even if some post are a little
older they are still valid -- maybe not in every detail though). There
are also a couple for video recording of Storm talk available online:
https://storm.apache.org/talksAndVideos.html

About contributing to the project: There are many ways to contribute. I
personally did only provide a few trivial patches. But I try to answer
question at the user and dev list, as well as on Stackoverflow.

For development, right now Clojure is important for the core of Storm --
even if there is a major ongoing effort to replace Clojure with Java.
For "higher level" parts, there is a lot of Java code -- so it depend
which components you want to work on if you should get started with
Clojure or not.

To get stated with contributing code, I would recommend to pick a newbie
JIRA you find interesting:

See
https://issues.apache.org/jira/browse/STORM-1621?jql=project%20%3D%20STORM%20AND%20resolution%20%3D%20Unresolved%20AND%20labels%20in%20%28newbie%2C%20%22newbie%2B%2B%22%2C%20newbiee%2C%20newdev%2C%20beginner%2C%20beginners%29%20ORDER%20BY%20priority%20DESC%2C%20key%20DESC

As fas as I know, there is no documentation about the project structure
(also do not know any blog posts for this) -- you will need to dig
through it manually. However, if you are familiar with the main concept,
this is not too hard.

I use Intellij -- it has better Clojure support than Eclipse (but this
is of course largely opinion based).

Storm documentation is unfortunately not perfect. Even if the community
tries to improve them constantly. As a newbie, this might actually be a
good starting point, too. As you want to pick up the details, just start
to write them down in parallel and share them :)

Hope this helps! :)

-Matthias

On 08/09/2016 02:20 AM, Aditya Desai wrote:
> Hello Everyone
> 
> I am newbie for Apache Storm and would like to learn storm from basics to
> advanced level. I am studying in University of Southern California and want
> to be contributor in the coming days.  I am good in Java and use Eclipse
> for all Java development.
> 
> I have few questions, can someone answer them/help me out -
> 1. Can anyone help me in knowing the start point, from where to begin? Is
> it mandatory to know Clojure to contribute to Storm.
> 2. Is Eclipse good IDE for Storm development.
> 3. Are there any good blogs that can throw some light on important code
> snippets of Storm ( like important classes of a good Java code base).
> 4. Any good examples on Storm Development?
> 
> Thanks in advance.
> 
> 
> Regards
> 

signature.asc
Description: OpenPGP digital signature

[jira] [Commented] (STORM-1936) Support default value for WindowedBolt

2016-06-29 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355572#comment-15355572
 ] 

Matthias J. Sax commented on STORM-1936:


[~darion] Can you elaborate a little more, what this is about? Was there a 
discussion on the mailing list, that I did miss?

> Support default value for WindowedBolt
> --
>
> Key: STORM-1936
> URL: https://issues.apache.org/jira/browse/STORM-1936
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>Reporter: darion yaphet
>Assignee: darion yaphet
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Broken links on WebPage

2016-04-26 Thread Matthias J. Sax

Hi,

I just openend an PR for improve the WebPage:
https://github.com/apache/storm/pull/1363

The link to "Daemon Metrics/Monitoring" does not work for 2.0-SNAPSHOT.

Furthermore, "Daemon Metrics/Monitoring" and "Window users guide" does
not work for 1.0.0. I guess we should apply this PR to 1.0.0, too. Also
backport the fix for the broken "Windows guide" (which work in
2.0-SNAPSHOT) to 1.0.0.

Please give feedback.

-Matthias



signature.asc
Description: OpenPGP digital signature

Re: how can i increase size of heap

2016-04-16 Thread Matthias J. Sax

1) yes, this is correct (just as you do it on command line, ie, multiple
flags separated by blank)

worker childopts : "-Xms4g -Djava.net.preferIPv4Stack=true"

2) still nor sure what you mean... if you get GC overhead limit exceed
it means that the GC tries to clean up objects to free memory, but no
objects could be deleted (and GC tried multiple time without success).
This is a Java thing and not directly related to Storm. You should be
able to tackle it, by increasing the JVM memory, as you already do by (1) :)

-Matthias


On 04/16/2016 01:29 PM, sam mohel wrote:
> thanks but what if i want to write this also in worker childopts :
> "-Djava.net.preferIPv4Stack=true"
> 
> is it will be like
> 
> worker childopts : "-Xms4g -Djava.net.preferIPv4Stack=true" ?
> 
> for Second question i got this statemnent by searching for my problem with
> GC overhead limited exceed but didn't know how can i do it ?
> 
> 
> On Sat, Apr 16, 2016 at 1:22 PM, Matthias J. Sax <mj...@apache.org> wrote:
> 
>> use parameter worker.childopts
>>
>> worker.childopts: "-Xms4g"
>>
>> Not sure what you mean by your second question...
>>
>> -Matthias
>>
>> On 04/16/2016 06:34 AM, sam mohel wrote:
>>> i want to increase size of heap for worker to -Xms4G how can i write it
>> in
>>> storm.yaml ?
>>>
>>> and how can i set the number of CPU  consuming bolts to the amount of
>> cores
>>> ?
>>>
>>> Thanks for any help
>>>
>>
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: BYLAWS link incorrent in DEVELOPER.md

2016-04-16 Thread Matthias J. Sax

Thanks! I just fixed it.

-Matthias

On 04/16/2016 10:35 AM, Manu Zhang wrote:
> Hi committers,
> 
> FYI, The BYLAWS link
> 
> in DEVELOPER.md document is incorrect and returns 404.
> 
> Thanks,
> Manu Zhang
> 



signature.asc
Description: OpenPGP digital signature

Re: how can i increase size of heap

2016-04-16 Thread Matthias J. Sax

use parameter worker.childopts

worker.childopts: "-Xms4g"

Not sure what you mean by your second question...

-Matthias

On 04/16/2016 06:34 AM, sam mohel wrote:
> i want to increase size of heap for worker to -Xms4G how can i write it in
> storm.yaml ?
> 
> and how can i set the number of CPU  consuming bolts to the amount of cores
> ?
> 
> Thanks for any help
> 



signature.asc
Description: OpenPGP digital signature

Re: AW: AW: Use only latest values

2016-04-11 Thread Matthias J. Sax

Of course with advancing the ts for the next table scan...

> void nextTuple() {
>   long time = System.currentTimeMillis();
>   if(ts < time) {
>  // do table scan and emit all tuples)
>  ts = ts == Long.MIN_VALUE ? time + 6 : ts + 6;
>   }
>   // else do nothing
> }


On 04/11/2016 02:08 PM, Matthias J. Sax wrote:
> If I understand you correctly (I never used Redis), you put the data
> into a table. In Spout.next() we do a table scan, remember the current
> time stamp and to the next table scan a minute later.
> 
> Just make sure to return and not block in Spout.next() while waiting for
> the next scan!
> 
> Something like:
> 
> private long ts = Long.MIN_VALUE;
> 
> void nextTuple() {
>   long time = System.currentTimeMillis();
>   if(ts < time) {
>  // do table scan and emit all tuples)
>  ts = time;
>   }
>   // else do nothing
> }
> 
> 
> 
> 
> On 04/11/2016 02:04 PM, Daniela Stoiber wrote:
>> Hi Matthias
>>
>> Thank you very much for your reply.
>>
>> How can I ensure that the spout fetches the values every minute?
>>
>> Thank you in advance.
>>
>> Regards,
>> Daniela
>>
>>
>> 2016-04-11 11:10 GMT+02:00 Matthias J. Sax <mj...@apache.org>:
>>
>>> Hi,
>>>
>>> @Arun: thanks for correcting my (it's hard to keep up to date with the
>>> latest changes those days... :))
>>>
>>> @Daniela:
>>>  - processing time means that the windows are aligned to the wall-clock
>>> time of your machine when you process your data; this implies some
>>> non-determinism and not repeatability if you process historical data
>>>  - event time means that the windows are aligned to timestamps that are
>>> encoded in your tuples (eg, as an attribute); this allows for
>>> deterministic processing as the result of a computation is independent
>>> on the time you perform the computation
>>>
>>> For your Redis idea:
>>>
>>> You can certainly do this. A Spout fetches all data from Redis each
>>> minute and forwards it to the window bolt. Ie, you do not fetch directly
>>> within you agg-bolt.
>>>
>>> However, if you use a custom aggregate function, you might be able to do
>>> this without Redis in between and de-duplicate in you aggregate function
>>> directly. When the window closes, you store the value for each device in
>>> a hash-map (key: Device-ID). During processing, for each value in the
>>> window. If a second value comes in, you overwrite it. As long as you do
>>> not have too many devices (ie, the window and hash-map does fix in
>>> memory) this should be the simplest approach.
>>>
>>>
>>> -Matthias
>>>
>>>
>>> On 04/10/2016 10:46 PM, Daniela Stoiber wrote:
>>>> HI Arun,
>>>>
>>>> thank you for your reply.
>>>>
>>>> But my problem is that I need to add up the values over all devices, but
>>> I am only allowed to use the most recent value of each device. A value is
>>> valid as long as there is no new value for this device available.
>>>>
>>>> So if I receive a message with device A with value 1, value 1 should be
>>> used for the sum as long as the value of A does not change.
>>>> When I receive a new value for A, the new value should be used for the
>>> sum and the old one should be replaced.
>>>>
>>>> Therefore I thought to use Redis to store this information:
>>>> DeviceValue
>>>> A 1
>>>> B 10
>>>> C 4
>>>>
>>>> Then I would like to pull every minute the most recent value of each
>>> device to build the sum. Therefore I would like to use the windowed bolt.
>>> But I am not sure if it is possible to pull data out of Redis within a
>>> windowed bolt.
>>>>
>>>> Thank you in advance.
>>>>
>>>> Regards,
>>>> Daniela
>>>>
>>>>
>>>> -Ursprüngliche Nachricht-
>>>> Von: Arun Iyer [mailto:ai...@hortonworks.com] Im Auftrag von Arun
>>> Mahadevan
>>>> Gesendet: Sonntag, 10. April 2016 20:55
>>>> An: dev@storm.apache.org
>>>> Betreff: Re: AW: Use only latest values
>>>>
>>>> Hi Matthias,
>>>>
>>>> WindowedBolt does support event time. In trident its is not yet exposed.
>>>>
>>>> Hi

Re: AW: AW: Use only latest values

2016-04-11 Thread Matthias J. Sax

If I understand you correctly (I never used Redis), you put the data
into a table. In Spout.next() we do a table scan, remember the current
time stamp and to the next table scan a minute later.

Just make sure to return and not block in Spout.next() while waiting for
the next scan!

Something like:

private long ts = Long.MIN_VALUE;

void nextTuple() {
  long time = System.currentTimeMillis();
  if(ts < time) {
 // do table scan and emit all tuples)
 ts = time;
  }
  // else do nothing
}




On 04/11/2016 02:04 PM, Daniela Stoiber wrote:
> Hi Matthias
> 
> Thank you very much for your reply.
> 
> How can I ensure that the spout fetches the values every minute?
> 
> Thank you in advance.
> 
> Regards,
> Daniela
> 
> 
> 2016-04-11 11:10 GMT+02:00 Matthias J. Sax <mj...@apache.org>:
> 
>> Hi,
>>
>> @Arun: thanks for correcting my (it's hard to keep up to date with the
>> latest changes those days... :))
>>
>> @Daniela:
>>  - processing time means that the windows are aligned to the wall-clock
>> time of your machine when you process your data; this implies some
>> non-determinism and not repeatability if you process historical data
>>  - event time means that the windows are aligned to timestamps that are
>> encoded in your tuples (eg, as an attribute); this allows for
>> deterministic processing as the result of a computation is independent
>> on the time you perform the computation
>>
>> For your Redis idea:
>>
>> You can certainly do this. A Spout fetches all data from Redis each
>> minute and forwards it to the window bolt. Ie, you do not fetch directly
>> within you agg-bolt.
>>
>> However, if you use a custom aggregate function, you might be able to do
>> this without Redis in between and de-duplicate in you aggregate function
>> directly. When the window closes, you store the value for each device in
>> a hash-map (key: Device-ID). During processing, for each value in the
>> window. If a second value comes in, you overwrite it. As long as you do
>> not have too many devices (ie, the window and hash-map does fix in
>> memory) this should be the simplest approach.
>>
>>
>> -Matthias
>>
>>
>> On 04/10/2016 10:46 PM, Daniela Stoiber wrote:
>>> HI Arun,
>>>
>>> thank you for your reply.
>>>
>>> But my problem is that I need to add up the values over all devices, but
>> I am only allowed to use the most recent value of each device. A value is
>> valid as long as there is no new value for this device available.
>>>
>>> So if I receive a message with device A with value 1, value 1 should be
>> used for the sum as long as the value of A does not change.
>>> When I receive a new value for A, the new value should be used for the
>> sum and the old one should be replaced.
>>>
>>> Therefore I thought to use Redis to store this information:
>>> DeviceValue
>>> A 1
>>> B 10
>>> C 4
>>>
>>> Then I would like to pull every minute the most recent value of each
>> device to build the sum. Therefore I would like to use the windowed bolt.
>> But I am not sure if it is possible to pull data out of Redis within a
>> windowed bolt.
>>>
>>> Thank you in advance.
>>>
>>> Regards,
>>> Daniela
>>>
>>>
>>> -Ursprüngliche Nachricht-
>>> Von: Arun Iyer [mailto:ai...@hortonworks.com] Im Auftrag von Arun
>> Mahadevan
>>> Gesendet: Sonntag, 10. April 2016 20:55
>>> An: dev@storm.apache.org
>>> Betreff: Re: AW: Use only latest values
>>>
>>> Hi Matthias,
>>>
>>> WindowedBolt does support event time. In trident its is not yet exposed.
>>>
>>> Hi Daniela,
>>>
>>> You could solve your use cases in different ways. One would be to have a
>> WindowedBolt with a 1 min tumbling window, do your custom aggregation (e.g.
>> sum) every time the window tumbles and emit the results to another bolt
>> where you update the count in Redis. Most of your state saving could also
>> be automated by defining a Stateful bolt that would periodically checkpoint
>> your state (sum per device). You could also club both windowing and state
>> into a StatefulWindowedBolt implementation. You can evaluate the options
>> and decide based on your use cases.
>>>
>>> Take a look at the sample topologies (SlidingWindowTopology,
>> SlidingTupleTsTopology, StatefulTopology, StatefulWindowingTopology) in
>> storm-starter and t

Re: AW: AW: Use only latest values

2016-04-11 Thread Matthias J. Sax

Hi,

@Arun: thanks for correcting my (it's hard to keep up to date with the
latest changes those days... :))

@Daniela:
 - processing time means that the windows are aligned to the wall-clock
time of your machine when you process your data; this implies some
non-determinism and not repeatability if you process historical data
 - event time means that the windows are aligned to timestamps that are
encoded in your tuples (eg, as an attribute); this allows for
deterministic processing as the result of a computation is independent
on the time you perform the computation

For your Redis idea:

You can certainly do this. A Spout fetches all data from Redis each
minute and forwards it to the window bolt. Ie, you do not fetch directly
within you agg-bolt.

However, if you use a custom aggregate function, you might be able to do
this without Redis in between and de-duplicate in you aggregate function
directly. When the window closes, you store the value for each device in
a hash-map (key: Device-ID). During processing, for each value in the
window. If a second value comes in, you overwrite it. As long as you do
not have too many devices (ie, the window and hash-map does fix in
memory) this should be the simplest approach.


-Matthias


On 04/10/2016 10:46 PM, Daniela Stoiber wrote:
> HI Arun,
> 
> thank you for your reply.
> 
> But my problem is that I need to add up the values over all devices, but I am 
> only allowed to use the most recent value of each device. A value is valid as 
> long as there is no new value for this device available. 
> 
> So if I receive a message with device A with value 1, value 1 should be used 
> for the sum as long as the value of A does not change. 
> When I receive a new value for A, the new value should be used for the sum 
> and the old one should be replaced.
> 
> Therefore I thought to use Redis to store this information:
> DeviceValue
> A 1
> B 10
> C 4
> 
> Then I would like to pull every minute the most recent value of each device 
> to build the sum. Therefore I would like to use the windowed bolt. But I am 
> not sure if it is possible to pull data out of Redis within a windowed bolt.
> 
> Thank you in advance.
> 
> Regards,
> Daniela
> 
> 
> -Ursprüngliche Nachricht-
> Von: Arun Iyer [mailto:ai...@hortonworks.com] Im Auftrag von Arun Mahadevan
> Gesendet: Sonntag, 10. April 2016 20:55
> An: dev@storm.apache.org
> Betreff: Re: AW: Use only latest values
> 
> Hi Matthias, 
> 
> WindowedBolt does support event time. In trident its is not yet exposed.
> 
> Hi Daniela,
> 
> You could solve your use cases in different ways. One would be to have a 
> WindowedBolt with a 1 min tumbling window, do your custom aggregation (e.g. 
> sum) every time the window tumbles and emit the results to another bolt where 
> you update the count in Redis. Most of your state saving could also be 
> automated by defining a Stateful bolt that would periodically checkpoint your 
> state (sum per device). You could also club both windowing and state into a 
> StatefulWindowedBolt implementation. You can evaluate the options and decide 
> based on your use cases.
> 
> Take a look at the sample topologies (SlidingWindowTopology, 
> SlidingTupleTsTopology, StatefulTopology, StatefulWindowingTopology) in 
> storm-starter and the docs for more info.
> 
> https://github.com/apache/storm/blob/master/docs/Windowing.md
> 
> https://github.com/apache/storm/blob/master/docs/State-checkpointing.md
> 
> 
> -Arun
> 
> 
> 
> 
> On 4/10/16, 4:30 PM, "Matthias J. Sax" <mj...@apache.org> wrote:
> 
>> A tumbling window (ie, non-overlapping window) is the right approach (a 
>> sliding window is overlapping).
>>
>> The window goes into your aggregation bolt (windowing and aggregation 
>> goes hand in hand, ie, when the window gets closed, the aggregation is 
>> triggered and the window content is handed over to the aggregation 
>> function).
>>
>> Be aware that Storm (currently) only supports processing time window 
>> (an no event time windows).
>>
>> -Matthias
>>
>>
>> On 04/10/2016 09:56 AM, Daniela Stoiber wrote:
>>> Hi,
>>>
>>> thank you for your reply.
>>>
>>> How can I ensure that the latest values are pulled from Redis the sum 
>>> is updated every minute? Do I need a sliding window with an interval 
>>> of 1 minute? Where would this sliding window be located in my topology?
>>>
>>> Thank you in advance.
>>>
>>> Regards,
>>> Daniela
>>>
>>> -Ursprüngliche Nachricht-
>>> Von: Matthias J. Sax [mailto:mj...@apache.

Re: AW: Use only latest values

2016-04-10 Thread Matthias J. Sax

A tumbling window (ie, non-overlapping window) is the right approach (a
sliding window is overlapping).

The window goes into your aggregation bolt (windowing and aggregation
goes hand in hand, ie, when the window gets closed, the aggregation is
triggered and the window content is handed over to the aggregation
function).

Be aware that Storm (currently) only supports processing time window (an
no event time windows).

-Matthias


On 04/10/2016 09:56 AM, Daniela Stoiber wrote:
> Hi,
> 
> thank you for your reply.
> 
> How can I ensure that the latest values are pulled from Redis the sum is
> updated every minute? Do I need a sliding window with an interval of 1
> minute? Where would this sliding window be located in my topology?
> 
> Thank you in advance.
> 
> Regards,
> Daniela 
> 
> -Ursprüngliche Nachricht-
> Von: Matthias J. Sax [mailto:mj...@apache.org] 
> Gesendet: Samstag, 9. April 2016 12:13
> An: dev@storm.apache.org
> Betreff: Re: Use only latest values
> 
> Sounds reasonable.
> 
> 
> On 04/09/2016 08:34 AM, Daniela Stoiber wrote:
>> Hi,
>>
>>  
>>
>> I would like to cache values and to use only the latest "valid" values 
>> to build a sum.
>>
>> In more detail, I receive values from devices periodically. I would 
>> like to add up all the valid values each minute. But not every device 
>> sends a new value every minute. And as long as there is no new value 
>> the old one should be used for the sum. As soon as I receive a new 
>> value from a device I would like to overwrite the old value and to use 
>> the new one for the sum. Would that be possible with the combination of
> Storm and Redis?
>>
>>  
>>
>> My idea was to use the following:
>>
>>  
>>
>> - Kafka Spout
>>
>> - Storm Bolt for storing the tuples in Redis and for overwriting the 
>> values as soon as a new one is delivered
>>
>> - Storm Bolt for reading the latest tuples from Redis
>>
>> - Storm Bolt for grouping (I would like to group the devices per 
>> region)
>>
>> - Storm Bolt for aggregation
>>
>> - Storm Bolt for storing the results again in Redis
>>
>>  
>>
>> Thank you in advance.
>>
>>  
>>
>> Regards,
>>
>> Daniela
>>
>>
> 
> 



signature.asc
Description: OpenPGP digital signature

Re: Use only latest values

2016-04-09 Thread Matthias J. Sax

Sounds reasonable.


On 04/09/2016 08:34 AM, Daniela Stoiber wrote:
> Hi,
> 
>  
> 
> I would like to cache values and to use only the latest "valid" values to
> build a sum.
> 
> In more detail, I receive values from devices periodically. I would like to
> add up all the valid values each minute. But not every device sends a new
> value every minute. And as long as there is no new value the old one should
> be used for the sum. As soon as I receive a new value from a device I would
> like to overwrite the old value and to use the new one for the sum. Would
> that be possible with the combination of Storm and Redis?
> 
>  
> 
> My idea was to use the following:
> 
>  
> 
> - Kafka Spout
> 
> - Storm Bolt for storing the tuples in Redis and for overwriting the values
> as soon as a new one is delivered
> 
> - Storm Bolt for reading the latest tuples from Redis
> 
> - Storm Bolt for grouping (I would like to group the devices per region)
> 
> - Storm Bolt for aggregation
> 
> - Storm Bolt for storing the results again in Redis
> 
>  
> 
> Thank you in advance.
> 
>  
> 
> Regards,
> 
> Daniela
> 
> 



signature.asc
Description: OpenPGP digital signature

[jira] [Closed] (STORM-855) Add tuple batching

2016-03-31 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax closed STORM-855.
-
Resolution: Unresolved

https://github.com/apache/storm/pull/765 got merged with does this somewhat 
differently.

> Add tuple batching
> --
>
> Key: STORM-855
> URL: https://issues.apache.org/jira/browse/STORM-855
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-core
>            Reporter: Matthias J. Sax
>        Assignee: Matthias J. Sax
>Priority: Minor
>
> In order to increase Storm's throughput, multiple tuples can be grouped 
> together in a batch of tuples (ie, fat-tuple) and transfered from producer to 
> consumer at once.
> The initial idea is taken from https://github.com/mjsax/aeolus. However, we 
> aim to integrate this feature deep into the system (in contrast to building 
> it on top), what has multiple advantages:
>   - batching can be even more transparent to the user (eg, no extra 
> direct-streams needed to mimic Storm's data distribution patterns)
>   - fault-tolerance (anchoring/acking) can be done on a tuple granularity 
> (not on a batch granularity, what leads to much more replayed tuples -- and 
> result duplicates -- in case of failure)
> The aim is to extend TopologyBuilder interface with an additional parameter 
> 'batch_size' to expose this feature to the user. Per default, batching will 
> be disabled.
> This batching feature has pure tuple transport purpose, ie, tuple-by-tuple 
> processing semantics are preserved. An output batch is assembled at the 
> producer and completely disassembled at the consumer. The consumer output can 
> be batched again, however, independent of batched or non-batched input. Thus, 
> batches can be of different size for each producer-consumer pair. 
> Furthermore, consumers can receive batches of different size from different 
> producers (including regular non batched input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Emitting to non-declared output stream

2016-02-02 Thread Matthias J. Sax

Hi,

I opened an PR for this two weeks ago. Would love to get some feedback
about it: https://github.com/apache/storm/pull/1031

-Matthias

On 01/19/2016 04:45 PM, Bobby Evans wrote:
> I think this is something that we should be able to handle efficiently in all 
> cases.
> https://github.com/apache/storm/blob/master/storm-core/src/clj/org/apache/storm/daemon/task.clj#L120-L167
> creates the task function that is used for routing tuples to the correct 
> downstream component.  Not surprisingly emit-direct ignores the stream (which 
> might make things more difficult for that case), but for the normal emit
> https://github.com/apache/storm/blob/master/storm-core/src/clj/org/apache/storm/daemon/task.clj#L153
> looks up the grouper by the stream id.  My guess is that we are getting a 
> null/nil back and the fast-list-iter is skipping over everything, when we 
> should be able to do something with that null and throw an exception.
> 
> If it does look like we cannot do it without adding a lot of code to the 
> critical path, then go ahead and do a config to turn it on/off.
>  - Bobby 
> 
>     On Tuesday, January 19, 2016 8:47 AM, Matthias J. Sax <mj...@apache.org> 
> wrote:
>  
> 
>  Hi,
> 
> currently, I am using Storm 0.9.3. For first tests on a new topology, I
> use LocalCluster. It happened to me, that I emitted tuples to an output
> stream, that I did never declare (and thus not connect to). For this, I
> would expect an error message in the log. However, I don't get anything
> which makes debugging very hard.
> 
> What do you think about it? Should I open a JIRA for it?
> 
> For real cluster deployment, I think the overhead of checking the output
> stream ID is too large and one can easily see the problem in the UI --
> the non-declared output streams that gets tuples show up there. However,
> for LocalCluster, there is not UI and an error log message would be nice.
> 
> 
> -Matthias
> 
> 
>   
> 



signature.asc
Description: OpenPGP digital signature

Re: [DISCUSSION] Restructure Storm documentation

2016-01-24 Thread Matthias J. Sax

+1 for having documentation for older releases on the website and
JavaDocs for each version, too.

Btw: in the Flink project this process is automated completely. I am not
sure exactly how, but could figure it out. However, the documentation is
not part of the project website itself but hosted at ci.apache.org

Having this automated, is very nice for people how are using the current
Snapshot version, as the new docs get available very soon when something
changes.

-Matthias


On 01/22/2016 11:59 PM, Nathan Marz wrote:
> At the very least, the Javadocs should be available by version. This is
> something I used to do but looks like we forgot to keep doing that after
> the transition to Apache. Maintaining other docs (tutorials, etc.) by
> version is more difficult as those are rarely updated at the time of
> release.
> 
> On Fri, Jan 22, 2016 at 2:01 PM, Bobby Evans  wrote:
> 
>> It doesn't have to be Taylor cutting releases.  The only major requirement
>> around that is that the PMC votes on the release.
>>  - Bobby
>>
>> On Friday, January 22, 2016 3:48 PM, Kyle Nusbaum
>>  wrote:
>>
>>
>>  Yep, That's precisely what I was thinking.
>>
>> I don't really see a problem with the process being manual. It won't be
>> *too* much work, and we do releases infrequently enough that I don't see it
>> as a burden. A small helper script would probably be trivial to write.
>>
>> Of course, Taylor is the one cutting the releases, so I'll defer to him on
>> the automated/manual issue. -- Kyle
>>
>> On Friday, January 22, 2016 3:45 PM, P. Taylor Goetz <
>> ptgo...@gmail.com> wrote:
>>
>>
>>  I’m definitely open to improving the process such that we can have
>> version-specific documentation, and finding a way to automate updating the
>> asf-site branch during the release process. I’m also okay if that process
>> is somewhat manual.
>>
>> I’ve thought about it a little but haven’t really come with a process.
>>
>> Ideally we’d do something that would do a snapshot of the docs at release
>> time and create a subdirectory in the asf-site website (e.g. “1.0.0-docs”).
>>
>> I’m open to suggestions.
>>
>> -Taylor
>>
>>> On Jan 22, 2016, at 4:25 PM, Kyle Nusbaum 
>> wrote:
>>>
>>> The new website is awesome.
>>>
>>> Tt would be great to keep tabs on documentation for different versions
>> of Storm and host those different versions on the site.
>>>
>>> I don't care too much for having all the documentation in its own
>> branch. I would suggest that each version branch of Storm keeps its own
>> version of the docs -- or keeps any modifications to the docs, if not the
>> entire collection, in order to keep the common parts in sync -- and that
>> these docs get merged into the asf-site branch in their own version
>> directory as part of the release process.
>>> Please let me know what you think and I'll file Jira issues as
>> necessary.-- Kyle
>>
>>
>>
>>
>>
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature

[jira] [Updated] (STORM-1454) UI cannot handle topology names with spaces

2016-01-19 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated STORM-1454:
---
Affects Version/s: 0.9.3

> UI cannot handle topology names with spaces
> ---
>
> Key: STORM-1454
> URL: https://issues.apache.org/jira/browse/STORM-1454
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 0.9.3
>            Reporter: Matthias J. Sax
>Priority: Minor
>
> If I submit a topology with an name that contains spaces (eg, "Linear Road 
> Benchmark", I cannot access the detailed topology view. If I click on the 
> topology name in the Web UI, I get the following:
> Internal Server Error
> {noformat}
> NotAliveException(msg:Linear+Road+Benchmark-3-1452252242)
>   at 
> backtype.storm.generated.Nimbus$getTopologyInfo_result.read(Nimbus.java:11347)
>   at org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:78)
>   at 
> backtype.storm.generated.Nimbus$Client.recv_getTopologyInfo(Nimbus.java:491)
>   at 
> backtype.storm.generated.Nimbus$Client.getTopologyInfo(Nimbus.java:478)
>   at backtype.storm.ui.core$topology_page.invoke(core.clj:628)
>   at backtype.storm.ui.core$fn__8020.invoke(core.clj:853)
>   at compojure.core$make_route$fn__6199.invoke(core.clj:93)
>   at compojure.core$if_route$fn__6187.invoke(core.clj:39)
>   at compojure.core$if_method$fn__6180.invoke(core.clj:24)
>   at compojure.core$routing$fn__6205.invoke(core.clj:106)
>   at clojure.core$some.invoke(core.clj:2443)
>   at compojure.core$routing.doInvoke(core.clj:106)
>   at clojure.lang.RestFn.applyTo(RestFn.java:139)
>   at clojure.core$apply.invoke(core.clj:619)
>   at compojure.core$routes$fn__6209.invoke(core.clj:111)
>   at ring.middleware.reload$wrap_reload$fn__6234.invoke(reload.clj:14)
>   at backtype.storm.ui.core$catch_errors$fn__8059.invoke(core.clj:909)
>   at 
> ring.middleware.keyword_params$wrap_keyword_params$fn__6876.invoke(keyword_params.clj:27)
>   at 
> ring.middleware.nested_params$wrap_nested_params$fn__6915.invoke(nested_params.clj:65)
>   at ring.middleware.params$wrap_params$fn__6848.invoke(params.clj:55)
>   at 
> ring.middleware.multipart_params$wrap_multipart_params$fn__6943.invoke(multipart_params.clj:103)
>   at ring.middleware.flash$wrap_flash$fn__7124.invoke(flash.clj:14)
>   at ring.middleware.session$wrap_session$fn__7113.invoke(session.clj:43)
>   at ring.middleware.cookies$wrap_cookies$fn__7044.invoke(cookies.clj:160)
>   at ring.adapter.jetty$proxy_handler$fn__7324.invoke(jetty.clj:16)
>   at 
> ring.adapter.jetty.proxy$org.mortbay.jetty.handler.AbstractHandler$0.handle(Unknown
>  Source)
>   at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>   at org.mortbay.jetty.Server.handle(Server.java:326)
>   at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>   at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>   at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
>   at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {noformat}
> The link says: 
> http://dbis71:8080/topology.html?id=Linear+Road+Benchmark-5-1452255348
> If I replace the "+" with spaces manually I can access the page. However, I 
> cannot "kill" the topology -- a click on "kill" has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Emitting to non-declared output stream

2016-01-19 Thread Matthias J. Sax

Hi,

currently, I am using Storm 0.9.3. For first tests on a new topology, I
use LocalCluster. It happened to me, that I emitted tuples to an output
stream, that I did never declare (and thus not connect to). For this, I
would expect an error message in the log. However, I don't get anything
which makes debugging very hard.

What do you think about it? Should I open a JIRA for it?

For real cluster deployment, I think the overhead of checking the output
stream ID is too large and one can easily see the problem in the UI --
the non-declared output streams that gets tuples show up there. However,
for LocalCluster, there is not UI and an error log message would be nice.


-Matthias



signature.asc
Description: OpenPGP digital signature

HdfsSpout

2016-01-13 Thread Matthias J. Sax

Hi,

if I am not wrong, Storm comes only with an HdfsBolt but not with a
HdfsSpout. Would it be worth to add a HdfsSpout? Not sure how common
reading data from HDFS in Strom is.

-Matthias



signature.asc
Description: OpenPGP digital signature

[jira] [Created] (STORM-1454) UI cannot handle topology names with spaces

2016-01-08 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created STORM-1454:
--

 Summary: UI cannot handle topology names with spaces
 Key: STORM-1454
 URL: https://issues.apache.org/jira/browse/STORM-1454
 Project: Apache Storm
  Issue Type: Bug
Reporter: Matthias J. Sax
Priority: Minor


If I submit a topology with an name that contains spaces (eg, "Linear Road 
Benchmark", I cannot access the detailed topology view. If I click on the 
topology name in the Web UI, I get the following:

Internal Server Error
{noformat}
NotAliveException(msg:Linear+Road+Benchmark-3-1452252242)
at 
backtype.storm.generated.Nimbus$getTopologyInfo_result.read(Nimbus.java:11347)
at org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:78)
at 
backtype.storm.generated.Nimbus$Client.recv_getTopologyInfo(Nimbus.java:491)
at 
backtype.storm.generated.Nimbus$Client.getTopologyInfo(Nimbus.java:478)
at backtype.storm.ui.core$topology_page.invoke(core.clj:628)
at backtype.storm.ui.core$fn__8020.invoke(core.clj:853)
at compojure.core$make_route$fn__6199.invoke(core.clj:93)
at compojure.core$if_route$fn__6187.invoke(core.clj:39)
at compojure.core$if_method$fn__6180.invoke(core.clj:24)
at compojure.core$routing$fn__6205.invoke(core.clj:106)
at clojure.core$some.invoke(core.clj:2443)
at compojure.core$routing.doInvoke(core.clj:106)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at clojure.core$apply.invoke(core.clj:619)
at compojure.core$routes$fn__6209.invoke(core.clj:111)
at ring.middleware.reload$wrap_reload$fn__6234.invoke(reload.clj:14)
at backtype.storm.ui.core$catch_errors$fn__8059.invoke(core.clj:909)
at 
ring.middleware.keyword_params$wrap_keyword_params$fn__6876.invoke(keyword_params.clj:27)
at 
ring.middleware.nested_params$wrap_nested_params$fn__6915.invoke(nested_params.clj:65)
at ring.middleware.params$wrap_params$fn__6848.invoke(params.clj:55)
at 
ring.middleware.multipart_params$wrap_multipart_params$fn__6943.invoke(multipart_params.clj:103)
at ring.middleware.flash$wrap_flash$fn__7124.invoke(flash.clj:14)
at ring.middleware.session$wrap_session$fn__7113.invoke(session.clj:43)
at ring.middleware.cookies$wrap_cookies$fn__7044.invoke(cookies.clj:160)
at ring.adapter.jetty$proxy_handler$fn__7324.invoke(jetty.clj:16)
at 
ring.adapter.jetty.proxy$org.mortbay.jetty.handler.AbstractHandler$0.handle(Unknown
 Source)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{noformat}

The link says: 
http://dbis71:8080/topology.html?id=Linear+Road+Benchmark-5-1452255348
If I replace the "+" with spaces manually I can access the page. However, I 
cannot "kill" the topology -- a click on "kill" has no effect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Benchmarking Streaming Computation Engines at Yahoo!

2015-12-18 Thread Matthias J. Sax

Hi,

Flink is using byte buffer to transfer data. If a buffer does not fill
up quickly enough, a timeout is applied and the buffer is transfered
before if fills up. This timeout can be configured:

env.setBufferTimeout(timeoutMillis);

see:
https://ci.apache.org/projects/flink/flink-docs-release-0.8/streaming_guide.html#buffer-timeout

So for low throughput, the latency can be decrease by decreasing this
timeout value to avoid the extra waiting time you mentioned.

-Matthias

On 12/18/2015 09:42 AM, 刘键(Basti Liu) wrote:
> Hi Jerry,
> 
> Thanks for the clarification.
> But just for my understanding, the reason why we got the lower latency is the 
> "window" mechanism in Flink. I guess the stream in Flink is flushed as one or 
> several batches 
> for a window. So when lower throughputs, it will lead to the extra waiting at 
> source component. So it is possible to lower the latency of Flink by 
> adjusting configuration.
> Actually, my point here is that if we want to compete with Flink or spark 
> stream for at least once or exactly once (high throughput and low latency), 
> the acking mechanism 
> of storm needs to be improved. Currently, there are too many extras messages 
> for acking mechanism in Storm. Sometimes, the throughput of topology depends 
> on the 
> throughput of acker.
> 
> Regards
> Basti
> 
> -Original Message-
> From: Boyang(Jerry) Peng [mailto:jerryp...@yahoo-inc.com.INVALID] 
> Sent: Friday, December 18, 2015 7:08 AM
> To: dev@storm.apache.org
> Subject: Re: Benchmarking Streaming Computation Engines at Yahoo!
> 
> Hello Satiash,
> One of the experiments we wish to do in the future is to compare flink with 
> checkpointing with Storm with acking. If you look at our results, Storm with 
> acking does have lower latency than Flink without checkpointing at lower 
> throughputs.  The keyword here is lower throughputs. What we were trying to 
> say is that Storm with the optimizations we proposed can be comparable to 
> with Flink without checkpointing at higher throughputs even with acking 
> turned on. Best, Jerry 
> 
> 
> On Thursday, December 17, 2015 1:27 PM, Satish Duggana 
>  wrote:
>  
> 
>  Hi Jerry,
> Thanks for updating the blog.
> 
> Storm with acking should be compared with similar configuration on Flink
> which may be with checkpointing enabled or some other configuration which
> gives at-least-once guarantee. But the below paragraph gives an impression
> that storm with acking is equivalent of Flink without checkpointing which
> is not right.
> 
> "Without acking, Storm even beat Flink at very high throughput, and we
> expect that with further optimizations like combining bolts, more
> intelligent routing of tuples, and improved acking, Storm with acking
> enabled would compete with Flink at very high throughput too."
> 
> Thanks,
> Satish.
> 
> On Thu, Dec 17, 2015 at 10:47 PM, Boyang(Jerry) Peng <
> jerryp...@yahoo-inc.com.invalid> wrote:
> 
>> Hello Satish,
>> You are correct, there was a typo.  The sentence should be:
>> Flink uses a mechanism called checkpointing to guarantee processing.
>> Unless checkpointing is used in the Flink job, Flink offers at most once
>> processing similar to Storm with acking turned OFF.  For the Flink
>> benchmark we did not use checkpointing."
>>
>> We have already fixed the typo on the blog.  Thanks!
>> Best,
>> Boyang Jerry Peng
>>
>>
>>On Thursday, December 17, 2015 4:12 AM, Satish Duggana <
>> sdugg...@hortonworks.com> wrote:
>>
>>
>>  Hi Bobby etal,
>> Thanks for publishing blog post on “Benchmarking streaming computation
>> engines<
>> http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at>”.
>> It gives good insights on how different streaming engines perform with the
>> usecase mentioned.
>>
>> “Flink uses a mechanism called checkpointing to guarantee processing.
>> Unless checkpointing is used in the Flink job, Flink offers at most once
>> processing similar to Storm with acking turned on.  For the Flink benchmark
>> we did not use checkpointing."
>>
>> Above snippet in your blog was confusing regarding at-most-once guarantee.
>> My understanding is that Storm gives at-most-once without acking. But
>> at-least-once guarantee requires acking on. So, Storm’s acking should be
>> compared with Flink’s at-least-once guarantee which may be by enabling
>> checkpointing or any other required configuration. Am I missing anything
>> here?
>>
>> Thanks,
>> Satish.
>>
>>
>>
>>
> 
>   
> 



signature.asc
Description: OpenPGP digital signature

How to get number of tasks from TopologyContext

2015-12-16 Thread Matthias J. Sax

Hi,

today, the above question appeared on SO:
https://stackoverflow.com/questions/34309189/how-to-get-the-task-number-and-id-not-the-executor-in-storm

The problem is, that

TopologyContext.getComponentTasks()

returns the IDs of the executors (and not the tasks). The name of the
method is not chooses very good -- I guess this dates back to the time
before the separation of tasks and executors...

My question is now:

 - do tasks actually have an ID?
 - if yes, can those IDs be retrieved?
 - can we get at least the number of tasks per operator somehow?
 - should the above method get renamed?

As the number of tasks is fix, one could of course collect this
information an pass it via the Config to
StormSubmitter.submitTopology(...). However, this is quite a work-around.

Please let me know what you think about it.


-Matthias



signature.asc
Description: OpenPGP digital signature

Re: How to get number of tasks from TopologyContext

2015-12-16 Thread Matthias J. Sax

Please go ahead and answer it.

I actually thought the the methods works as intended, too. But the SO
person claims not to get the correct information (see the comments):


> Are you sure that it returns executor IDs and not task IDs? Do you
have different number of executors than tasks? – Matthias J. Sax

> yes,i set parallelism to 5,setTaskNumber(10)..it only return 5 taskid

Not sure what he/she is doing wrong...

-Matthias

On 12/16/2015 03:20 PM, Bobby Evans wrote:
> I forgot to add if you want me to pop on the stack overflow and answer it 
> myself I can.  I just don't want to step on your toes if you have started.
>  - Bobby 
> 
> 
> On Wednesday, December 16, 2015 8:17 AM, Bobby Evans 
> <ev...@yahoo-inc.com> wrote:
>  
> 
>  It should include all of the tasks, not the executors.  
> common/storm-task-info pulls out the task ids from the executor ids, before 
> doing a reverse map and sorting it to put it in the TopologyContext.  An 
> executor ID is a range of task IDs.  [1,5] indicates this executor handles 
> tasks 1, 2, 3, 4, 5.  Most of the time an executor ID is something like [1,1] 
> for task 1.  The code in storm-task-info expands it out. 
>  - Bobby 
> 
> 
> On Wednesday, December 16, 2015 6:52 AM, Matthias J. Sax 
> <mj...@apache.org> wrote:
>  
> 
>  Hi,
> 
> today, the above question appeared on SO:
> https://stackoverflow.com/questions/34309189/how-to-get-the-task-number-and-id-not-the-executor-in-storm
> 
> The problem is, that
> 
> TopologyContext.getComponentTasks()
> 
> returns the IDs of the executors (and not the tasks). The name of the
> method is not chooses very good -- I guess this dates back to the time
> before the separation of tasks and executors...
> 
> My question is now:
> 
>  - do tasks actually have an ID?
>  - if yes, can those IDs be retrieved?
>  - can we get at least the number of tasks per operator somehow?
>  - should the above method get renamed?
> 
> As the number of tasks is fix, one could of course collect this
> information an pass it via the Config to
> StormSubmitter.submitTopology(...). However, this is quite a work-around.
> 
> Please let me know what you think about it.
> 
> 
> -Matthias
> 
> 
>
> 
>   
> 



signature.asc
Description: OpenPGP digital signature

Re: How to get number of tasks from TopologyContext

2015-12-16 Thread Matthias J. Sax

Thanks for your feedback.

Turns out, the question was related to JStorm... I guess this should be
consider for the merge process.

> sorry , i find i use the jstorm. storm is no problem. but when i use 
> jstorm，this problem arise

-Matthias


On 12/16/2015 03:25 PM, Arun Iyer wrote:
> TopologyContext.getComponentTasks returns the list of task ids for the 
> component (not executor ids).
> 
> You could just try printing the output of getComponentTasks in the prepare 
> method after doing 'setNumTasks’ (with  task > parallelism) 
> while building the topology. Worked for me.
> 
> - Arun
> 
> 
> 
> On 12/16/15, 6:21 PM, "Matthias J. Sax" <mj...@apache.org> wrote:
> 
>> Hi,
>>
>> today, the above question appeared on SO:
>> https://stackoverflow.com/questions/34309189/how-to-get-the-task-number-and-id-not-the-executor-in-storm
>>
>> The problem is, that
>>
>> TopologyContext.getComponentTasks()
>>
>> returns the IDs of the executors (and not the tasks). The name of the
>> method is not chooses very good -- I guess this dates back to the time
>> before the separation of tasks and executors...
>>
>> My question is now:
>>
>> - do tasks actually have an ID?
>> - if yes, can those IDs be retrieved?
>> - can we get at least the number of tasks per operator somehow?
>> - should the above method get renamed?
>>
>> As the number of tasks is fix, one could of course collect this
>> information an pass it via the Config to
>> StormSubmitter.submitTopology(...). However, this is quite a work-around.
>>
>> Please let me know what you think about it.
>>
>>
>> -Matthias
>>



signature.asc
Description: OpenPGP digital signature

Re: How to get number of tasks from TopologyContext

2015-12-16 Thread Matthias J. Sax

Thanks Cody! Great you added it on SO, too!

On 12/16/2015 05:06 PM, Cody Innowhere wrote:
> I've replied the answer in Stackoverflow too if you don't mind.
> 
> On Thu, Dec 17, 2015 at 12:01 AM, Cody Innowhere <e.neve...@gmail.com>
> wrote:
> 
>> Hi Matthias,
>> In JStorm, there's no executor, so 
>> TopologyContext.getComponentTasks()
>> returns task ids within this component.
>>
>> As to your questions:
>> - do tasks actually have an ID?
>> *[Cody] *in JStorm, each task has an taskId of Integer type. Usually,
>> each component is assigned a range of task id's (the num is equal to
>> component parallelism)
>>
>>  - if yes, can those IDs be retrieved?
>> *[Cody] *Yes, use TopologyContext.getThisTaskId() method
>>
>>  - can we get at least the number of tasks per operator somehow?
>> *[Cody] *Yes, use TopologyContext.getComponentTasks().size()
>>
>>  - should the above method get renamed?
>> *[Cody] *We may discuss this when merging phase starts. You may refer to
>> related jira later.
>>
>>
>> On Wed, Dec 16, 2015 at 11:37 PM, Matthias J. Sax <mj...@apache.org>
>> wrote:
>>
>>> Thanks for your feedback.
>>>
>>> Turns out, the question was related to JStorm... I guess this should be
>>> consider for the merge process.
>>>
>>>> sorry , i find i use the jstorm. storm is no problem. but when i use
>>> jstorm，this problem arise
>>>
>>> -Matthias
>>>
>>>
>>> On 12/16/2015 03:25 PM, Arun Iyer wrote:
>>>> TopologyContext.getComponentTasks returns the list of task ids for the
>>> component (not executor ids).
>>>>
>>>> You could just try printing the output of getComponentTasks in the
>>> prepare method after doing 'setNumTasks’ (with  task > parallelism)
>>>> while building the topology. Worked for me.
>>>>
>>>> - Arun
>>>>
>>>>
>>>>
>>>> On 12/16/15, 6:21 PM, "Matthias J. Sax" <mj...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> today, the above question appeared on SO:
>>>>>
>>> https://stackoverflow.com/questions/34309189/how-to-get-the-task-number-and-id-not-the-executor-in-storm
>>>>>
>>>>> The problem is, that
>>>>>
>>>>> TopologyContext.getComponentTasks()
>>>>>
>>>>> returns the IDs of the executors (and not the tasks). The name of the
>>>>> method is not chooses very good -- I guess this dates back to the time
>>>>> before the separation of tasks and executors...
>>>>>
>>>>> My question is now:
>>>>>
>>>>> - do tasks actually have an ID?
>>>>> - if yes, can those IDs be retrieved?
>>>>> - can we get at least the number of tasks per operator somehow?
>>>>> - should the above method get renamed?
>>>>>
>>>>> As the number of tasks is fix, one could of course collect this
>>>>> information an pass it via the Config to
>>>>> StormSubmitter.submitTopology(...). However, this is quite a
>>> work-around.
>>>>>
>>>>> Please let me know what you think about it.
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature

Re: Welcome new Apache Storm Committers/PMC Members

2015-12-11 Thread Matthias J. Sax

Thanks for the warm welcome! Feeling really honored about this promotion!

Congratulations to Londa, Arun, Jerry, and Aaron!

-Matthias

On 12/09/2015 03:25 AM, Satish Duggana wrote:
> Congratulations everyone. Looking forward to working with you all.
> 
> 
> 
> 
> On 12/9/15, 5:44 AM, "Harsha" <st...@harsha.io> wrote:
> 
>> Congrats everyone.
>> -Harsha
>>
>> On Tue, Dec 8, 2015, at 03:19 PM, Priyank Shah wrote:
>>> Congratulation everyone!
>>>
>>>
>>>
>>>
>>> On 12/8/15, 3:17 PM, "Hugo Da Cruz Louro" <hlo...@hortonworks.com> wrote:
>>>
>>>> Congrats everyone. Looking forward to working with you!
>>>>
>>>>> On Dec 8, 2015, at 2:54 PM, Aaron.Dossett <aaron.doss...@target.com> 
>>>>> wrote:
>>>>>
>>>>> Thanks, everyone, and congratulations to the other new committers as well!
>>>>>
>>>>> On 12/8/15, 4:12 PM, "임정택" <kabh...@gmail.com> wrote:
>>>>>
>>>>>> Congratulation! Looking forward to work with you all.
>>>>>>
>>>>>> Best,
>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> On 2015년 12월 9일 (수) at 오전 6:55 P. Taylor Goetz <ptgo...@gmail.com> wrote:
>>>>>>
>>>>>>> I’m pleased to announce that the following individuals have joined as
>>>>>>> Apache Storm Committers/PMC Members:
>>>>>>>
>>>>>>> - Longda Feng
>>>>>>> - Arun Mahadevan
>>>>>>> - Boyang Jerry Peng
>>>>>>> - Matthias J. Sax
>>>>>>> - Aaron Dossett
>>>>>>>
>>>>>>> Longda, Arun, Jerry, Matthias, and Aaron have all demonstrated technical
>>>>>>> merit and dedication to Apache Storm and its community, and as PMC
>>>>>>> members
>>>>>>> they will help drive innovation and community development.
>>>>>>>
>>>>>>> Please join me in welcoming and congratulating Londa, Arun, Jerry,
>>>>>>> Matthias, and Aaron. We look forward to your continued dedication to the
>>>>>>> Storm community.
>>>>>>>
>>>>>>> -Taylor
>>>>>>>
>>>>>
>>>>
>>



signature.asc
Description: OpenPGP digital signature

Storm at Stackoverflow

2015-10-21 Thread Matthias J. Sax

Hi,

currently, there are two tags (apache-storm and storm) used on SO. I
just suggested "apache-storm" to be the main tag and "storm" to be a
synonym for it. This enables that all questions get tagged with a unique
tag. Old and new questions get re-tag from storm to apache-storm
automatically if the synonym get accepted. For this to happen, at least
4 upvotes must be casted.

If you have an SO account, please upvote here:
https://stackoverflow.com/tags/apache-storm/synonyms

Thanks for your support!

-Matthias



signature.asc
Description: OpenPGP digital signature

[jira] [Created] (STORM-1101) test-retry-read-assignments in backtype.storm.supervisor-test fails

2015-10-09 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created STORM-1101:
--

 Summary: test-retry-read-assignments in 
backtype.storm.supervisor-test fails
 Key: STORM-1101
 URL: https://issues.apache.org/jira/browse/STORM-1101
 Project: Apache Storm
  Issue Type: Sub-task
  Components: storm-core
Reporter: Matthias J. Sax


https://travis-ci.org/mjsax/storm/builds/84478972

{noformat}
ava.lang.RuntimeException: Should not have multiple topologies assigned to one 
port
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.7.0_76]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 ~[?:1.7.0_76]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.7.0_76]
at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 
~[?:1.7.0_76]
at clojure.lang.Reflector.invokeConstructor(Reflector.java:180) 
~[clojure-1.7.0.jar:?]
at backtype.storm.util$throw_runtime.doInvoke(util.clj:845) 
~[classes/:?]
at clojure.lang.RestFn.invoke(RestFn.java:408) ~[clojure-1.7.0.jar:?]
at 
backtype.storm.daemon.supervisor$read_assignments$fn__9770.doInvoke(supervisor.clj:84)
 ~[classes/:?]
at clojure.lang.RestFn.invoke(RestFn.java:421) ~[clojure-1.7.0.jar:?]
at clojure.core$merge_with$merge_entry__4649.invoke(core.clj:2932) 
~[clojure-1.7.0.jar:?]
at clojure.core$reduce1.invoke(core.clj:909) ~[clojure-1.7.0.jar:?]
at clojure.core$merge_with$merge2__4651.invoke(core.clj:2935) 
~[clojure-1.7.0.jar:?]
at clojure.core$reduce1.invoke(core.clj:909) ~[clojure-1.7.0.jar:?]
at clojure.core$reduce1.invoke(core.clj:900) ~[clojure-1.7.0.jar:?]
at clojure.core$merge_with.doInvoke(core.clj:2936) 
~[clojure-1.7.0.jar:?]
at clojure.lang.RestFn.applyTo(RestFn.java:139) ~[clojure-1.7.0.jar:?]
at clojure.core$apply.invoke(core.clj:632) ~[clojure-1.7.0.jar:?]
at 
backtype.storm.daemon.supervisor$read_assignments.invoke(supervisor.clj:84) 
~[classes/:?]
at 
backtype.storm.daemon.supervisor$read_assignments.invoke(supervisor.clj:86) 
~[classes/:?]
at 
backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__10013.invoke(supervisor.clj:449)
 ~[classes/:?]
at backtype.storm.event$event_manager$fn__9629.invoke(event.clj:40) 
[classes/:?]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_76]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1095) Tuple.getSourceGlobalStreamid() has wrong camel-case naming

2015-10-07 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created STORM-1095:
--

 Summary: Tuple.getSourceGlobalStreamid() has wrong camel-case 
naming
 Key: STORM-1095
 URL: https://issues.apache.org/jira/browse/STORM-1095
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Trivial


The method Tuple.getSourceGlobalStreamid() should be named 
Tuple.getSourceGlobalStreamId() to follow camel-case naming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: does anyone else hate the verbose logging of all PR comments in the Storm JIRAs?

2015-09-22 Thread Matthias J. Sax

On Github, you can disable mail notification about each comment in your
profile configuration (at least for you personal email address -- I
guess it still goes over the mailing list)

Profile -> Settings -> Notification Center

-Matthias

On 09/22/2015 03:21 AM, Erik Weathers wrote:
> Sure, STORM-*.  ;-)
> 
> Here's a good example:
> 
>- https://issues.apache.org/jira/browse/STORM-329
> 
> Compare that to this one:
> 
>- https://issues.apache.org/jira/browse/STORM-404
> 
> STORM-404 has a bunch of human-created comments, but it's readable since it
> has no github-generated comments.  STORM-329 however intermixes the human
> comments with the github ones.  It's really hard to read through.
> 
> To be clear, it's not that it's *confusing* per se -- it's that the
> behavior is *cluttering* the comments, making it harder to see any
> human-created comments since any JIRA issue with a PR will usually end up
> with many automated comments.
> 
> BTW, I totally agree that linking from the JIRA issue to the github PR is
> important!  Would be even nicer if the github PRs also directly linked back
> to the JIRA issue with a clickable link.
> 
> - Erik
> 
> On Mon, Sep 21, 2015 at 6:03 PM, 임정택  wrote:
> 
>> Hi Erik,
>>
>> I think verbose logging of PR comments could be OK. I didn't experience any
>> confusing.
>> Maybe referring sample JIRA issues could help us to understand.
>>
>> But I'm also open to change cause other projects already have been doing.
>> (for example, https://issues.apache.org/jira/browse/SPARK-10474)
>>
>> In addition to SPARK has been doing, I'd like to still leave some events on
>> github PR to JIRA issue, too.
>>
>> Btw, the thing I'm really annoyed is multiple mail notifications on each
>> github comment.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2015-09-22 9:15 GMT+09:00 Erik Weathers :
>>
>>> I find that these comments majorly distract from any discussion that may
>>> occur in the JIRA issues themselves.   What value are these providing?  I
>>> guess just insurance against GitHub being unavailable or going away?  But
>>> that doesn't seem worth the distraction cost.  Is there any possibility
>> of
>>> removing this spamminess, or somehow putting them into attachments within
>>> the JIRA issues so that they aren't directly in the comments?
>>>
>>> - Erik
>>>
>>
>>
>>
>> --
>> Name : 임 정택
>> Blog : http://www.heartsavior.net / http://dev.heartsavior.net
>> Twitter : http://twitter.com/heartsavior
>> LinkedIn : http://www.linkedin.com/in/heartsavior
>>
> 



signature.asc
Description: OpenPGP digital signature

[jira] [Closed] (STORM-1044) Setting dop to zero does not raise an error

2015-09-20 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax closed STORM-1044.
--

Merged via cf61b46319469bd4ac223be6859f061ad9c197e4

> Setting dop to zero does not raise an error
> ---
>
> Key: STORM-1044
> URL: https://issues.apache.org/jira/browse/STORM-1044
> Project: Apache Storm
>  Issue Type: Bug
>        Reporter: Matthias J. Sax
>            Assignee: Matthias J. Sax
>Priority: Minor
> Fix For: 0.10.0
>
>
> While I did some testing (with automatic topology plugging code) I set the 
> dop of all spouts and bolts to zero. I submitted the topology to 
> {{LocalCluster}} and did not get any error.
> It took me a while to figure out that the wrongly specified dop were the 
> reason for an empty result of the test. From a user point of view, it would 
> be nice to raise an exception when the {{parallelism_hint}} is smaller than 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (STORM-1044) Setting dop to zero does not raise an error

2015-09-15 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned STORM-1044:
--

Assignee: Matthias J. Sax

> Setting dop to zero does not raise an error
> ---
>
> Key: STORM-1044
> URL: https://issues.apache.org/jira/browse/STORM-1044
> Project: Apache Storm
>  Issue Type: Bug
>        Reporter: Matthias J. Sax
>            Assignee: Matthias J. Sax
>Priority: Minor
>
> While I did some testing (with automatic topology plugging code) I set the 
> dop of all spouts and bolts to zero. I submitted the topology to 
> {{LocalCluster}} and did not get any error.
> It took me a while to figure out that the wrongly specified dop were the 
> reason for an empty result of the test. From a user point of view, it would 
> be nice to raise an exception when the {{parallelism_hint}} is smaller than 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1044) Setting dop to zero does not raise an error

2015-09-11 Thread Matthias J. Sax (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14741109#comment-14741109
 ] 

Matthias J. Sax commented on STORM-1044:


I think the easiest way to fix this, would be to check the {{parallelism_hint}} 
in {{TopologyBuilder}}. If you think this is the right approach, I could fix 
this issues by myself. Otherwise, please let me know which other approach would 
be appropriate.

> Setting dop to zero does not raise an error
> ---
>
> Key: STORM-1044
> URL: https://issues.apache.org/jira/browse/STORM-1044
> Project: Apache Storm
>  Issue Type: Bug
>        Reporter: Matthias J. Sax
>Priority: Minor
>
> While I did some testing (with automatic topology plugging code) I set the 
> dop of all spouts and bolts to zero. I submitted the topology to 
> {{LocalCluster}} and did not get any error.
> It took me a while to figure out that the wrongly specified dop were the 
> reason for an empty result of the test. From a user point of view, it would 
> be nice to raise an exception when the {{parallelism_hint}} is smaller than 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1044) Setting dop to zero does not raise an error

2015-09-11 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created STORM-1044:
--

 Summary: Setting dop to zero does not raise an error
 Key: STORM-1044
 URL: https://issues.apache.org/jira/browse/STORM-1044
 Project: Apache Storm
  Issue Type: Bug
Reporter: Matthias J. Sax
Priority: Minor


While I did some testing (with automatic topology plugging code) I set the dop 
of all spouts and bolts to zero. I submitted the topology to {{LocalCluster}} 
and did not get any error.

It took me a while to figure out that the wrongly specified dop were the reason 
for an empty result of the test. From a user point of view, it would be nice to 
raise an exception when the {{parallelism_hint}} is smaller than 1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Storm UI does not load

2015-09-04 Thread Matthias J. Sax

Clearing the cache did the trick! Thanks

-Matthias

On 09/03/2015 11:54 PM, 임정택 wrote:
> Hi, Matthias.
> 
> Could you clear Firefox cache and try again?
> I met same issue from Chrome when I switch Storm's version from
> 0.10.0-beta1 to 0.9.5 or opposite, and resolved that issue by clearing
> cache.
> If it doesn't work, how about checking any errors from Firefox developer
> tool or Firebug and report to dev mailing list?
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR)
> 
> 2015-09-02 20:44 GMT+09:00 Matthias J. Sax <mj...@informatik.hu-berlin.de>:
> 
>> Hi,
>>
>> I am running Debian Jessy in my laptop. Up to now I never had problems
>> to access Storm UI via Iceweasel (ie, Debian's Firefox). But now, the UI
>> does not load completely. It displays "Loading summary..." and the
>> background is shaded. The cluster summary on top of the page is not
>> rendered correctly. See attached screenshot.
>>
>> The UI loads without problems using Opera.
>>
>> I am building Storm from the sources using the current master version
>> (0.11.0-SNAPSHOT). I just rebased. The last commit is
>>
>>> commit 154e9ec55deb4eea8fca8554e4d3b224bf337834
>>
>> I am not sure what the cause of the problem might be, but I do install
>> available OS updates regularly. I guess that an update changed
>> something. (Maybe an upgrade to a new version of Iceweasel?) The
>> currently installed version is 31.8.0.
>>
>> Maybe anyone can reproduce the problem and have a look into it? For now,
>> I just use Opera. So for me, it's not urgent.
>>
>>
>> -Matthias
>>
> 
> 
> 



signature.asc
Description: OpenPGP digital signature

Storm UI does not load

2015-09-02 Thread Matthias J. Sax

Hi,

I am running Debian Jessy in my laptop. Up to now I never had problems
to access Storm UI via Iceweasel (ie, Debian's Firefox). But now, the UI
does not load completely. It displays "Loading summary..." and the
background is shaded. The cluster summary on top of the page is not
rendered correctly. See attached screenshot.

The UI loads without problems using Opera.

I am building Storm from the sources using the current master version
(0.11.0-SNAPSHOT). I just rebased. The last commit is

> commit 154e9ec55deb4eea8fca8554e4d3b224bf337834

I am not sure what the cause of the problem might be, but I do install
available OS updates regularly. I guess that an update changed
something. (Maybe an upgrade to a new version of Iceweasel?) The
currently installed version is 31.8.0.

Maybe anyone can reproduce the problem and have a look into it? For now,
I just use Opera. So for me, it's not urgent.


-Matthias


signature.asc
Description: OpenPGP digital signature

[jira] [Commented] (STORM-855) Add tuple batching

2015-08-30 Thread Matthias J. Sax (JIRA)

[
https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721538#comment-14721538
]

Matthias J. Sax commented on STORM-855:
---

There are some additional results in the PR comments:
https://github.com/apache/storm/pull/694
And there are more result available at similar links:
https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-16.pdf
[so all numbers from 0 to 16]

Add tuple batching
--

Key: STORM-855
URL: https://issues.apache.org/jira/browse/STORM-855
Project: Apache Storm
Issue Type: New Feature
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

In order to increase Storm's throughput, multiple tuples can be grouped
together in a batch of tuples (ie, fat-tuple) and transfered from producer to
consumer at once.
The initial idea is taken from https://github.com/mjsax/aeolus. However, we
aim to integrate this feature deep into the system (in contrast to building
it on top), what has multiple advantages:
- batching can be even more transparent to the user (eg, no extra
direct-streams needed to mimic Storm's data distribution patterns)
- fault-tolerance (anchoring/acking) can be done on a tuple granularity
(not on a batch granularity, what leads to much more replayed tuples -- and
result duplicates -- in case of failure)
The aim is to extend TopologyBuilder interface with an additional parameter
'batch_size' to expose this feature to the user. Per default, batching will
be disabled.
This batching feature has pure tuple transport purpose, ie, tuple-by-tuple
processing semantics are preserved. An output batch is assembled at the
producer and completely disassembled at the consumer. The consumer output can
be batched again, however, independent of batched or non-batched input. Thus,
batches can be of different size for each producer-consumer pair.
Furthermore, consumers can receive batches of different size from different
producers (including regular non batched input).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-855) Add tuple batching

2015-08-27 Thread Matthias J. Sax (JIRA)

[
https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716323#comment-14716323
]

Matthias J. Sax commented on STORM-855:
---

I never worked with Trident by myself, but as far as I understand,
micro-batching breaks tuple-by-tuple processing semantics. A batch of tuples is
assembled at the source and piped through the topology. The batch stays a batch
the whole time.

This is quite different from my approach: Tuples are only batched under the
hood and tuple-by-tuple processing semantics are preserved. A batch is
assembled at the output of an operator and de-assembled at the consumer. The
consumer does not need to batch its own output. Hence, batching is introduced
on a operator basis, ie, for each operator batching (the output) can be enabled
and disabled independently (also allowing for different batch sizes for
different operators and different batch sizes for different output streams).
Thus, the latency might not increase as much as batch size can be adjusted fine
grained. Additionally, if a single tuple fails, only this single tuple needs to
get replayed (and not the whole batch as in Trident).

Last but not least, [~revans2] encourage me to contribute this feature. Please
see here:
https://mail-archives.apache.org/mod_mbox/storm-dev/201505.mbox/%3C55672973.9040809%40informatik.hu-berlin.de%3E

Add tuple batching
--

Key: STORM-855
URL: https://issues.apache.org/jira/browse/STORM-855
Project: Apache Storm
Issue Type: New Feature
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Cannot run WordCountExample in Intellij

2015-08-19 Thread Matthias J. Sax

Thanks for the explanation! The hint with PYHTONPATH got me to the solution:

The SplitSentence bolt must be configured correctly.

I just change the code from

builder.setBolt(split, new SplitSentence(), 8).shuffleGrouping(spout);

SplitSentence pythonSplit = new SplitSentence();
Map env = new HashMap();
env.put(PYTHONPATH,
/home/mjsax/workspace_storm/storm/storm-multilang/python/src/main/resources/resources/);
pythonSplit.setEnv(env);
builder.setBolt(split,pythonSplit, 8).shuffleGrouping(spout);

-Matthias

On 08/19/2015 04:29 AM, 임정택 wrote:
Hi Matthias, sorry to respond lately.

Unfortunately, AFAIK you can't run multilang feature with LocalCluster
without having packaged file.

ShellProcess relies on codeDir of TopologyContext, which is used by
supervisor. Workers are serialized to stormcode.ser, but multilang files
should extracted to outside of serialized file so that python/ruby/node/etc
can load it.

Accomplishing this with distribute mode is easy because there's always user
submitted jar, and supervisor can know it is what user submitted.

But accomplishing this with local mode is not easy cause supervisor cannot
know user submitted jar, and users can run topology to local mode without
packaging.

So, Supervisor in local mode finds resource directory (resources) from
each jars (which ends with jar) in classpath, and copy first occurrence
to codeDir.

storm jar places user topology jar to the first of classpath, so it can be
run without issue.

So normally, it's natural for ShellProcess to not find splitsentence.py.
Maybe your working directory or PYTHONPATH do the trick.

Hope this helps.

Best,
Jungtaek Lim (HeartSaVioR)
ps.
I also respond to your SO question with same content.
http://stackoverflow.com/a/32085316/3480122

2015-08-10 6:49 GMT+09:00 Matthias J. Sax mj...@informatik.hu-berlin.de:

Hi,

I work with Storm for a while already, but want to get started with
development. As suggested, I am using Intellij (up to now, I was using
Eclipse).

I was also looking at

https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

This documentation is not complete. I was not able to run anything in
Intellij first. I could figure out, that I need to remove the scope of
storm-core dependency (in storm-starter pom.xml). (found here:

https://stackoverflow.com/questions/30724424/storm-starter-with-intellij-idea-maven-project-could-not-find-class
)

After that I wass able to build the project. I can also run
ExclamationTopology with no problems within Intellij. However,
WordCountTopology fails.

First I got the following error:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException:
Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most recent call last):
File splitsentence.py, line 16, in module
import storm
ImportError: No module named storm

I was able to resolve it via: apt-get install python-storm
(from StackOverflow)

However, I don't speak Python and was wondering what the problem is and
why I could resolve it like this. Just want to get deeper into it. Maybe
someone can explain.

Unfortunately, I am getting a different error now:

I did not find any solution on the Internet. And as I am not familiar
with Python and never used Storm differently as low-level Java API I am
stuck now. Because ExclamationTopology runs, I guess my basic setup is
correct.

What do I do wrong?

-Matthias

signature.asc
Description: OpenPGP digital signature

Re: Cannot run WordCountExample in Intellij

2015-08-11 Thread Matthias J. Sax

Does anyone else have an idea how to solve this?

On 08/10/2015 11:13 AM, Matthias J. Sax wrote:
Changing the working directory did not solve the problem.

Btw, there are multiple storm.py files

storm$ find . -name storm.py
./storm-dist/binary/target/apache-storm-0.11.0-SNAPSHOT/bin/storm.py

./storm-core/target/test-classes/resources/storm.py

./bin/storm.py

./storm-multilang/python/target/classes/resources/storm.py

./storm-multilang/python/src/main/resources/resources/storm.py

Not all of them are equal (is this intended?).

Those three are equal:
./storm-core/target/test-classes/resources/storm.py
./storm-multilang/python/target/classes/resources/storm.py
./storm-multilang/python/src/main/resources/resources/storm.py

(I guess storm-multilang-src is the original one, copied to both target
directories).

The other two are different to those and to each other. Is there some
modification going on during build/packaging?

I tried to include all three version via changing working directory.
Neither worked.

Btw: I am wondering about this assumption. IMHO, for development, it is
much more convenient to start a topology in LocalCluster within an IDE.

-Matthias

On 08/10/2015 09:20 AM, Abhishek Agarwal wrote:
I think that it was always assumed that topology would always be invoked
through storm-command line. Thus working directory would be
${STORM-INSTALLATION}/bin/storm
Since storm.py is in the this directory, splitSentence.py would be able to
find storm modules. Can you set the working directory to a path, where
storm.py is present and then try. If it works, we can add it later to the
documentation

On Mon, Aug 10, 2015 at 12:21 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

I am using current storm/master (ie, 0.11.0-SNAPSHOT). My Python is
2.7.9 (I am using Debian Jessy). Using OpenJDK 1.7.0_79.

About pom.xml: I am aware, that the environments are different and it
makes sense for cluster deployment to set scope to provided. I just
claim, that this information should be on the web page. ;)
-

https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

Of even better, your fix, using maven-shade-plugin, should be commited. :)

Here is the correct error message:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException:
Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most recent call last):
File splitsentence.py, line 18, in module
class SplitSentenceBolt(storm.BasicBolt):
AttributeError: 'module' object has no attribute 'BasicBolt'

Thanks in advance!

-Matthias

On 08/10/2015 07:33 AM, Abhishek Agarwal wrote:
Adding/removing scope of storm-core is cumbersome. If you ship storm-core
along with the uber jar, the topology will fail on production cluster.
Instead I have set the scope of compile, and excluded the storm jars in
the
maven shade plugin. This way, both environments work with no changes.

You have pasted the same error twice. By the way, I didn't have to
install
the python-storm to run the topology. Which version are you using?

On Mon, Aug 10, 2015 at 3:19 AM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Hi,

I work with Storm for a while already, but want to get started with
development. As suggested, I am using Intellij (up to now, I was using
Eclipse).

I was also looking at

https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

https://stackoverflow.com/questions/30724424/storm-starter-with-intellij-idea-maven-project-could-not-find-class
)

After that I wass able to build the project. I can also run
ExclamationTopology with no problems within Intellij. However,
WordCountTopology fails.

First I got the following error:

java.lang.RuntimeException

Re: Cannot run WordCountExample in Intellij

2015-08-10 Thread Matthias J. Sax

Changing the working directory did not solve the problem.

Btw, there are multiple storm.py files

storm$ find . -name storm.py
./storm-dist/binary/target/apache-storm-0.11.0-SNAPSHOT/bin/storm.py

./storm-core/target/test-classes/resources/storm.py

./bin/storm.py

./storm-multilang/python/target/classes/resources/storm.py

./storm-multilang/python/src/main/resources/resources/storm.py

Not all of them are equal (is this intended?).

Those three are equal:
./storm-core/target/test-classes/resources/storm.py
./storm-multilang/python/target/classes/resources/storm.py
./storm-multilang/python/src/main/resources/resources/storm.py

(I guess storm-multilang-src is the original one, copied to both target
directories).

The other two are different to those and to each other. Is there some
modification going on during build/packaging?

I tried to include all three version via changing working directory.
Neither worked.

Btw: I am wondering about this assumption. IMHO, for development, it is
much more convenient to start a topology in LocalCluster within an IDE.

-Matthias

On Mon, Aug 10, 2015 at 12:21 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

I am using current storm/master (ie, 0.11.0-SNAPSHOT). My Python is
2.7.9 (I am using Debian Jessy). Using OpenJDK 1.7.0_79.

About pom.xml: I am aware, that the environments are different and it
makes sense for cluster deployment to set scope to provided. I just
claim, that this information should be on the web page. ;)
-

https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

Of even better, your fix, using maven-shade-plugin, should be commited. :)

Here is the correct error message:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException:
Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most recent call last):
File splitsentence.py, line 18, in module
class SplitSentenceBolt(storm.BasicBolt):
AttributeError: 'module' object has no attribute 'BasicBolt'

Thanks in advance!

-Matthias

You have pasted the same error twice. By the way, I didn't have to
install
the python-storm to run the topology. Which version are you using?

On Mon, Aug 10, 2015 at 3:19 AM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Hi,

I work with Storm for a while already, but want to get started with
development. As suggested, I am using Intellij (up to now, I was using
Eclipse).

I was also looking at

https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

https://stackoverflow.com/questions/30724424/storm-starter-with-intellij-idea-maven-project-could-not-find-class
)

After that I wass able to build the project. I can also run
ExclamationTopology with no problems within Intellij. However,
WordCountTopology fails.

First I got the following error:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException:
Pipe to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most

Re: Cannot run WordCountExample in Intellij

2015-08-10 Thread Matthias J. Sax

I am using current storm/master (ie, 0.11.0-SNAPSHOT). My Python is
2.7.9 (I am using Debian Jessy). Using OpenJDK 1.7.0_79.

About pom.xml: I am aware, that the environments are different and it
makes sense for cluster deployment to set scope to provided. I just
claim, that this information should be on the web page. ;)
-
https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

Of even better, your fix, using maven-shade-plugin, should be commited. :)

Here is the correct error message:

java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe
to subprocess seems to be broken! No output read.
Serializer Exception:
Traceback (most recent call last):
File splitsentence.py, line 18, in module
class SplitSentenceBolt(storm.BasicBolt):
AttributeError: 'module' object has no attribute 'BasicBolt'

Thanks in advance!

-Matthias

On 08/10/2015 07:33 AM, Abhishek Agarwal wrote:
Adding/removing scope of storm-core is cumbersome. If you ship storm-core
along with the uber jar, the topology will fail on production cluster.
Instead I have set the scope of compile, and excluded the storm jars in the
maven shade plugin. This way, both environments work with no changes.

You have pasted the same error twice. By the way, I didn't have to install
the python-storm to run the topology. Which version are you using?

On Mon, Aug 10, 2015 at 3:19 AM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Hi,

I work with Storm for a while already, but want to get started with
development. As suggested, I am using Intellij (up to now, I was using
Eclipse).

I was also looking at

https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

https://stackoverflow.com/questions/30724424/storm-starter-with-intellij-idea-maven-project-could-not-find-class
)

After that I wass able to build the project. I can also run
ExclamationTopology with no problems within Intellij. However,
WordCountTopology fails.

First I got the following error:

I was able to resolve it via: apt-get install python-storm
(from StackOverflow)

However, I don't speak Python and was wondering what the problem is and
why I could resolve it like this. Just want to get deeper into it. Maybe
someone can explain.

Unfortunately, I am getting a different error now:

What do I do wrong?

-Matthias

signature.asc
Description: OpenPGP digital signature

Cannot run WordCountExample in Intellij

2015-08-09 Thread Matthias J. Sax

Hi,

I work with Storm for a while already, but want to get started with
development. As suggested, I am using Intellij (up to now, I was using
Eclipse).

I was also looking at
https://github.com/apache/storm/tree/master/examples/storm-starter#intellij-idea

This documentation is not complete. I was not able to run anything in
Intellij first. I could figure out, that I need to remove the scope of
storm-core dependency (in storm-starter pom.xml). (found here:
https://stackoverflow.com/questions/30724424/storm-starter-with-intellij-idea-maven-project-could-not-find-class)

After that I wass able to build the project. I can also run
ExclamationTopology with no problems within Intellij. However,
WordCountTopology fails.

First I got the following error:

 java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe 
 to subprocess seems to be broken! No output read.
 Serializer Exception:
 Traceback (most recent call last):
   File splitsentence.py, line 16, in module
 import storm
 ImportError: No module named storm

I was able to resolve it via: apt-get install python-storm
(from StackOverflow)

However, I don't speak Python and was wondering what the problem is and
why I could resolve it like this. Just want to get deeper into it. Maybe
someone can explain.

Unfortunately, I am getting a different error now:

 java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: Pipe 
 to subprocess seems to be broken! No output read.
 Serializer Exception:
 Traceback (most recent call last):
   File splitsentence.py, line 16, in module
 import storm
 ImportError: No module named storm

I did not find any solution on the Internet. And as I am not familiar
with Python and never used Storm differently as low-level Java API I am
stuck now. Because ExclamationTopology runs, I guess my basic setup is
correct.

What do I do wrong?

-Matthias





signature.asc
Description: OpenPGP digital signature

Re: Submitting multiple jars to a topology classpath

2015-07-30 Thread Matthias J. Sax

Storm itself does not provide any support to upload multiple jars. As a
workaround you can put required jars into $STORM/lib folder manually
(you need to do this on every node in the cluster!)

-Matthias

On 07/30/2015 08:55 AM, Abhishek Agarwal wrote:
 Currently, as far as I know one has to package all the dependencies into
 one jar and then submit it along with topology class. StormSubmitter
 interface also allows only one jar. Is there any particular reason for this
 limitation?
 
 We have a use case where we want to upload more than one jar without
 packaging them together. How could this be achieved?
 



signature.asc
Description: OpenPGP digital signature

Re: Block spout

2015-06-29 Thread Matthias J. Sax

I don't know the storm signals project you mentioned. It might work
for you. It is not part of storm-core system.

However, I do not understand why you write to a file and read from it
again. Why not using one more bolt and just process the tuples there?

I would also not sleep the spout in your case, but only sync
writing/reading from file.


-Matthias


On 06/29/2015 04:00 PM, Pradheep s wrote:
 Hi,
 The problem i have is that, i write the tuples received form a spout to a
 file through a bolt. I also use another thread to read form it. Sometime
 the writer might overtake reader and override tuple which the reader has to
 read and lose tuples. So i have to avoid this..
 I heard about two things, i have to give back the control from bolt to
 spout in order for spout to sent the next tuple. is this right or the spot
 continuously sends the tuples whatever the bolt does with it?
 Next is about storm signals https://github.com/ptgoetz/storm-signals, is
 this feature available , so that i can try to send a sleep signal to spout
 whenever i need?
 
 Thanks,
 Pradheep
 
 On Mon, Jun 29, 2015 at 5:16 AM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 
 It depends what you want so accomplish... You can always sleep in
 Spout.nextTuple() to block the spout for a finite time.

 As an alternative, you can limit the number of pending tuples be setting
 parameter topology.max.spout.pending (be aware that tuples might time
 out in this case).

 Sending a signal from bolt to spout is not supported by Storm. If you
 want to do this, you need to code it by yourself. But it is tricky to
 do, I would not recommend it.


 -Matthias


 On 06/28/2015 05:51 PM, Pradheep s wrote:
 Hi,

 I have a spout which is emitting random numbers continuously to a bolt
 which received them. Is it possible to block the spout for a finite time
 from sending the tuples to the bolt? i have to send a blocking signal
 from
 the bolt to spout?if possible how to do it?
 Please let me know if someone can give some about this.

 Thanks,
 Pradheep



 



signature.asc
Description: OpenPGP digital signature

Re: Block spout

2015-06-29 Thread Matthias J. Sax

It depends what you want so accomplish... You can always sleep in
Spout.nextTuple() to block the spout for a finite time.

As an alternative, you can limit the number of pending tuples be setting
parameter topology.max.spout.pending (be aware that tuples might time
out in this case).

Sending a signal from bolt to spout is not supported by Storm. If you
want to do this, you need to code it by yourself. But it is tricky to
do, I would not recommend it.


-Matthias


On 06/28/2015 05:51 PM, Pradheep s wrote:
 Hi,
 
 I have a spout which is emitting random numbers continuously to a bolt
 which received them. Is it possible to block the spout for a finite time
 from sending the tuples to the bolt? i have to send a blocking signal from
 the bolt to spout?if possible how to do it?
 Please let me know if someone can give some about this.
 
 Thanks,
 Pradheep
 



signature.asc
Description: OpenPGP digital signature

Re: New Committer/PMC Member: Jungtaek Lim

2015-06-29 Thread Matthias J. Sax

Congrats Jungtaek!

On 06/29/2015 09:26 PM, P. Taylor Goetz wrote:
 Please join me in welcoming Jungtaek Lim (AKA “HeartSaVioR) as a new Apache 
 Storm Committer and PMC member.
 
 Jungtaek has demonstrated a strong commitment to the Apache Storm community 
 through active participation and mentoring on the Storm mailing lists. 
 Furthermore, he has authored many enhancements and bug fixes spanning both 
 Storm’s core codebase, as well as a numerous integration components.
 
 Congratulations and welcome Jungtaek!
 
 -Taylor
 



signature.asc
Description: OpenPGP digital signature

Re: Newbie questions about deploying to cluster

2015-06-18 Thread Matthias J. Sax

Sure.

On 06/18/2015 12:55 PM, Tim Molter wrote:
 Thank you, Matthias.
 
 Can I run nimbus and a supervisor on one machine as well (assuming
 low-load)?
 
 Thanks.
 
 On 2015_06_18 12:45 PM, Matthias J. Sax wrote:
 1) yes. nimbus.

 2) running zookeeper and nimbus on the same machine should not be a
 problem in general. if you have a high-load scenario this might become a
 problem (but it should be ok in most cases). If you have an increased
 fault-tolerance requirement, it is also critical to run on different
 machines (but again, it should be not required for most cases)

 -Matthias


 On 06/18/2015 12:42 PM, Tim Molter wrote:
 1) Let's say I have 4 machines. One running zookeeper, one running
 nimbus, and 2 running supervisor. Which host do I deploy the my topology
 via `storm jar`? The nimbus host?

 2) Can I run zookeeper, nimbus and manager all on one machine? Or is it
 critical to separate certain ones on separate hosts?


 



signature.asc
Description: OpenPGP digital signature

Re: Newbie questions about deploying to cluster

2015-06-18 Thread Matthias J. Sax

1) yes. nimbus.

2) running zookeeper and nimbus on the same machine should not be a
problem in general. if you have a high-load scenario this might become a
problem (but it should be ok in most cases). If you have an increased
fault-tolerance requirement, it is also critical to run on different
machines (but again, it should be not required for most cases)

-Matthias


On 06/18/2015 12:42 PM, Tim Molter wrote:
 1) Let's say I have 4 machines. One running zookeeper, one running
 nimbus, and 2 running supervisor. Which host do I deploy the my topology
 via `storm jar`? The nimbus host?
 
 2) Can I run zookeeper, nimbus and manager all on one machine? Or is it
 critical to separate certain ones on separate hosts?
 



signature.asc
Description: OpenPGP digital signature

[jira] [Commented] (STORM-855) Add tuple batching

2015-06-15 Thread Matthias J. Sax (JIRA)

[
https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14587123#comment-14587123
]

Matthias J. Sax commented on STORM-855:
---

I started to work on this. You can find the current progress at:
https://github.com/mjsax/storm/tree/batching
It will take some time. Need to get started with Clojure first :) I will open a
pull request if the code is more mature. Feedback is welcome any time.

Add tuple batching
--

Key: STORM-855
URL: https://issues.apache.org/jira/browse/STORM-855
Project: Apache Storm
Issue Type: New Feature
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (STORM-792) Missing documentation in backtype.storm.generated.Nimbus

2015-06-12 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax closed STORM-792.
-

 Missing documentation in backtype.storm.generated.Nimbus
 

 Key: STORM-792
 URL: https://issues.apache.org/jira/browse/STORM-792
 Project: Apache Storm
  Issue Type: Documentation
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor
 Fix For: 0.10.0


 Explain the difference between Nimbus$Client.getTopology(String id) and 
 Nimbus$Client.getUserTopology(String id)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-855) Add tuple batching

2015-06-08 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created STORM-855:
-

 Summary: Add tuple batching
 Key: STORM-855
 URL: https://issues.apache.org/jira/browse/STORM-855
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor


In order to increase Storm's throughput, multiple tuples can be grouped 
together in a batch of tuples (ie, fat-tuple) and transfered from producer to 
consumer at once.

The initial idea is taken from https://github.com/mjsax/aeolus. However, we aim 
to integrate this feature deep into the system (in contrast to building it on 
top), what has multiple advantages:
  - batching can be even more transparent to the user (eg, no extra 
direct-streams needed to mimic Storm's data distribution patterns)
  - fault-tolerance (anchoring/acking) can be done on a tuple granularity (not 
on a batch granularity, what leads to much more replayed tuples -- and result 
duplicates -- in case of failure)

The aim is to extend TopologyBuilder interface with an additional parameter 
'batch_size' to expose this feature to the user. Per default, batching will be 
disabled.

This batching feature has pure tuple transport purpose, ie, tuple-by-tuple 
processing semantics are preserved. An output batch is assembled at the 
producer and completely disassembled at the consumer. The consumer output can 
be batched again, however, independent of batched or non-batched input. Thus, 
batches can be of different size for each producer-consumer pair. Furthermore, 
consumers can receive batches of different size from different producers 
(including regular non batched input).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-855) Add tuple batching

2015-06-08 Thread Matthias J. Sax (JIRA)

[
https://issues.apache.org/jira/browse/STORM-855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias J. Sax updated STORM-855:
--
Issue Type: New Feature (was: Improvement)

Add tuple batching
--

Key: STORM-855
URL: https://issues.apache.org/jira/browse/STORM-855
Project: Apache Storm
Issue Type: New Feature
Reporter: Matthias J. Sax
Assignee: Matthias J. Sax
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Aeolus 0.1 available

2015-06-03 Thread Matthias J. Sax

Thanks for the input.

Currently, everything is written in Java (I am not familiar with Clojure
-- maybe a good way to get started with it though ;)). Nathan just
mentioned that the code could be included into external modules. Thus,
it might be the easiest way to put it there. What are those external
module Nathan is referring to?

I am just wondering how deep the integration in the system should be. If
a deeper integration is the better solution, we should follow this path.

You are the experts. What is the better solution?

-Matthias

On 06/03/2015 09:19 PM, Bobby Evans wrote:
Sorry I didn't respond sooner, thing are rather busy :). You should be able
to file JIRA yourself if you want to, it is open to anyone. Storm has not
documented the code base very well. The core part of storm is in the
storm-core sub project. It has both java and clojure code in it. The
clojure code is where most everything happens. The daemons are located under
storm-core/src/clj/backtype/storm/daemon. worker.clj and executor.clj are
probable the places that you would want to update metrics and routing. The
code that creates the topology is in java.
- Bobby

On Thursday, May 28, 2015 9:46 AM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Hi Bobby,

I never thought about it. But if the community is interested in it, I
would be happy to contribute it. :)

However, I am not super familiar with the actual structure of Storm's
code base and I would need some pointers to integrate in into the system
correctly and nicely.

I claim, to understand the internals of Storm quite well, however, I
have more a user perspective on the system so far.

If I should work on it, it might be a good idea to open a JIRA and
assign it to me, and we can take it from there?

-Matthias

On 05/28/2015 03:20 PM, Bobby Evans wrote:
Have you thought about contributing this back to storm itself? From what I
have read and a quick pass through the code it looks like from a user
perspective you replace one builder with another. From a code perspective
it looks like you replace the fields grouping with one that understands the
batching semantics, and wrap the bolts/spouts with batch/unbatch logic.
This feels like something that could easily fit into storm with minor
modification and give users more control over latency vs. throughput in
their topologies. Making it an official part of storm too, would allow us
to update the metrics system to understand the batching and display results
on a per tuple basis instead of on a per batch basis.
- Bobby

On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Hi Manu,

please find a simple benchmark evaluation on Storm 0.9.3 using the
following links (it's to much content to attach to this Email).

https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf

The files shows the result for batch sizes 0 to 4. You can replace the
last 0 by values up to 16 to get result for higher batch sizes.

What you can basically observe, it that the maximum achieved data rate
in the non-batching case is about 250.000 tuple per second (tps) while a
batch size of about 30 increases it to 2.000.000 tps (with high
fluctuation; that decreases with even higher batch sizes).

The benchmark uses a single spout (dop=1) and single bolt (dop=1) and
measure the output/input rate (in tps) as well as network traffic (in
KB/s) for different batch sizes.

The spout emits simple single attribute tuples (type Integer) and is
configured to emit with a dedicated (stable) output rate. We did
multiple runs in the benchmark combining different output rates (from
200.000 tps to 2.000.000 tps in steps of 200.000) with different batch
sizes (from 1 to 80).

Each run used a different configures spout output rate and
consists of 4 plots showing measures network traffic and output/input
rate for spout and bolt. The plots might be hard to read (they are
design for ourself only, and not for publishing). If you have questions
about them, please let me know.

We run the experiment in our local cluster. Each node has two Xeon
E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected
via 1Gbit Ethernet (10Gbit Switch).

The code and scripts for running the benchmark are on github, too.
Please refer to the maven module monitoring. So you should be able to
run the benchmark on your own hardware.

-Matthias

On 05/28/2015 08:44 AM, Manu Zhang wrote:
Hi Matthias,

The project looks interesting. Any detailed performance data compared with
latest storm versions (0.9.3 / 0.9.4) ?

Thanks,
Manu Zhang

On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Dear Storm community,

we would like to share our project Aeolus with you. While the project is
not finished, our first component

Re: [DISCUSS] Drop Support for Java 1.6 in Storm 0.10.0

2015-06-01 Thread Matthias J. Sax

I thinks it is a good idea to drop Java 6. It reached it's life cycle
already 2 year ago.

-Matthias


On 06/01/2015 08:37 PM, P. Taylor Goetz wrote:
 CC user@
 
 I’d like to poll the community about the possibility of dropping support for 
 Java 1.6 in the Storm 0.10.0 release. To date, we have been very conservative 
 in terms of supporting 1.6, and I don’t see much of an issue moving up to 
 1.7, but I’d like to get a broader view.
 
 If we were to move up to 1.7, it would make the upcoming 0.9.5 version the 
 last release compatible with Java 1.6.
 
 Any strong opinions? Would anyone object?
 
 -Taylor 
 



signature.asc
Description: OpenPGP digital signature

Re: Aeolus 0.1 available

2015-05-28 Thread Matthias J. Sax

Hi Manu,

please find a simple benchmark evaluation on Storm 0.9.3 using the
following links (it's to much content to attach to this Email).

https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf

The files shows the result for batch sizes 0 to 4. You can replace the
last 0 by values up to 16 to get result for higher batch sizes.

The benchmark uses a single spout (dop=1) and single bolt (dop=1) and
measure the output/input rate (in tps) as well as network traffic (in
KB/s) for different batch sizes.

We run the experiment in our local cluster. Each node has two Xeon
E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected
via 1Gbit Ethernet (10Gbit Switch).

The code and scripts for running the benchmark are on github, too.
Please refer to the maven module monitoring. So you should be able to
run the benchmark on your own hardware.

-Matthias

On 05/28/2015 08:44 AM, Manu Zhang wrote:
Hi Matthias,

The project looks interesting. Any detailed performance data compared with
latest storm versions (0.9.3 / 0.9.4) ?

Thanks,
Manu Zhang

On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Dear Storm community,

we would like to share our project Aeolus with you. While the project is
not finished, our first component --- a transparent batching layer ---
is available now.

Aeolus' batching component, is a transparent layer that can increase
Storm's throughput by an order of magnitude while keeping tuple-by-tuple
processing semantics. Batching happens transparent to the system and the
user code. Thus, it can be used without changing existing code.

Aeolus is available using Apache License 2.0 and would be happy to any
feedback. If you like to try it out, you can download Aeolus from our
git repository:
https://github.com/mjsax/aeolus

Happy hacking,
Matthias

signature.asc
Description: OpenPGP digital signature

Aeolus 0.1 available

2015-05-26 Thread Matthias J. Sax

Dear Storm community,

we would like to share our project Aeolus with you. While the project is
not finished, our first component --- a transparent batching layer ---
is available now.

Aeolus' batching component, is a transparent layer that can increase
Storm's throughput by an order of magnitude while keeping tuple-by-tuple
processing semantics. Batching happens transparent to the system and the
user code. Thus, it can be used without changing existing code.

Aeolus is available using Apache License 2.0 and would be happy to any
feedback. If you like to try it out, you can download Aeolus from our
git repository:
https://github.com/mjsax/aeolus


Happy hacking,
  Matthias



signature.asc
Description: OpenPGP digital signature

Re: Hooking into the internal messaging system

2015-05-07 Thread Matthias J. Sax

Hi,

you can user collector.emitDirect(...) to send a tuple to a specific
task. However, you cannot assign task IDs. You need to get the ID from
the given topology object in .open()/.prepare()

If you use emitDirect(...), we need to declare the output stream as an
direct stream of course.


-Matthias

On 05/07/2015 11:50 AM, Pieter-Jan Van Aeken wrote:
 Hi,
 
 I am trying to create a loop in Storm. To do this, I would like to be
 able to replay a Tuple across a series of Bolts until certain criteria
 are met. One of them being max loops so that it does not go into a
 never ending loop. 
 
 The way I would like to do this is by (ab)using the internal messaging
 system. Is there a way I can create an OutputCollector which submits
 records to a Task ID that I provide? That way, I would not need to
 worry if Storm is using ZeroMQ or Netty under the hood.
 
 Thanks in advance,
 
 Pieter-Jan Van Aeken
 
 



signature.asc
Description: OpenPGP digital signature

Re: Forcing code distribution+geographic position of nodes

2015-05-06 Thread Matthias J. Sax

Depends on your implementation ;)

Storm's default scheduler is implementing the same interface, so it is
possible the do it in an efficient way.

-Matthias


On 05/06/2015 12:54 PM, Franca van Kaam wrote:
 Thanks
 
 Might be a stupid question but would this be efficient if I want to use
 this for all of the nodes in my cluster? certain groups of nodes should
 have some of the bolts on them, and other nodes should be executing
 different bolts... Is it possible too do this on such a large scale?
 
 On Wed, May 6, 2015 at 12:42 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 
 Hi,

 you can implement your own custom scheduler. An example how to do this
 is given here.


 https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


 -Matthias



 On 05/06/2015 12:18 PM, Franca van Kaam wrote:
 Hello,

 I am fairly new to Storm and Clojure and I am wondering if there is any
 way
 to force the way Nimbus distributes the code to the worker nodes. Also is
 there any way to take geographic position of the nodes into account? If
 this does not exist yet could you pinpoint me to the right location in
 the
 source code where this could be implemented?

 Thanks in advance and best regards,

 Franca van Kaam



 



signature.asc
Description: OpenPGP digital signature

Re: Forcing code distribution+geographic position of nodes

2015-05-06 Thread Matthias J. Sax

Hi,

you can implement your own custom scheduler. An example how to do this
is given here.

https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


-Matthias



On 05/06/2015 12:18 PM, Franca van Kaam wrote:
 Hello,
 
 I am fairly new to Storm and Clojure and I am wondering if there is any way
 to force the way Nimbus distributes the code to the worker nodes. Also is
 there any way to take geographic position of the nodes into account? If
 this does not exist yet could you pinpoint me to the right location in the
 source code where this could be implemented?
 
 Thanks in advance and best regards,
 
 Franca van Kaam
 



signature.asc
Description: OpenPGP digital signature

Re: Forcing code distribution+geographic position of nodes

2015-05-06 Thread Matthias J. Sax

That is possible, that need very careful design to get it right.

The example show in the link I posted below, uses some meta data as
well. This meta data could be your geographical coordinate I guess.


-Matthias

On 05/06/2015 01:58 PM, Franca van Kaam wrote:
 Ok I will try it, thanks a lot ;)
 
 how about geographical coordinates? is it possible to put this as a
 property of the node and use it in the scheduling as well as in grouping to
 redirect tuples?
 
 On Wed, May 6, 2015 at 1:45 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 
 Depends on your implementation ;)

 Storm's default scheduler is implementing the same interface, so it is
 possible the do it in an efficient way.

 -Matthias


 On 05/06/2015 12:54 PM, Franca van Kaam wrote:
 Thanks

 Might be a stupid question but would this be efficient if I want to use
 this for all of the nodes in my cluster? certain groups of nodes should
 have some of the bolts on them, and other nodes should be executing
 different bolts... Is it possible too do this on such a large scale?

 On Wed, May 6, 2015 at 12:42 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Hi,

 you can implement your own custom scheduler. An example how to do this
 is given here.



 https://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/


 -Matthias



 On 05/06/2015 12:18 PM, Franca van Kaam wrote:
 Hello,

 I am fairly new to Storm and Clojure and I am wondering if there is any
 way
 to force the way Nimbus distributes the code to the worker nodes. Also
 is
 there any way to take geographic position of the nodes into account? If
 this does not exist yet could you pinpoint me to the right location in
 the
 source code where this could be implemented?

 Thanks in advance and best regards,

 Franca van Kaam






 



signature.asc
Description: OpenPGP digital signature

Re: Why is Storm Config not modifiable?

2015-04-23 Thread Matthias J. Sax

Thanks for your explanations.

IHMO, from a user point of view, the interface of open()/prepare() is
not well designed. It specifies to hand-in a Java-Map, and you would not
expect that a Map is not modifiable.

Would it be possible to change the parameter type to
clojure.lang.APersistentMap ? This would expose the persistent nature of
the map to the user. Not sure if this is possible, due to compatibility
issues... It's just a thought.

-Matthias


On 04/23/2015 06:10 PM, Bobby Evans wrote:
 Yes you could all assoc on the Map, if you cast the map to an IPersistentMap 
 and call the assoc method on it, but if you do it blindly without checking 
 they actual type of the Map passed in you may run into issues in the future, 
 if we do change some internals away from java.
  - Bobby
  
 
 
  On Thursday, April 23, 2015 9:07 AM, Jeremy Heiler 
 jeremyhei...@gmail.com wrote:

 
  On Thu, Apr 23, 2015 at 4:42 AM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 
 In my case, I use a class hierarchy with abstract spouts/bolts and
 multiple concrete spouts/bolts. The abstract class uses some default
 values (if not provided in the given config) similar to the derived
 classes. However, some derived classes overwrite the default values for
 the abstract class, ie, if no value is given in the config, some derived
 classes set the values and other do not. Additionally, open()/prepare()
 of the abstract class is called, too, and set the default value if the
 value was neither provided by the user, not by the derived class.


 The idiomatic way to update an immutable field like this in Clojure would
 be to use an atom. Here you could store the conf in an AtomicReference
 field, call .assoc on the map to update it, and .compareAndSet it on the
 AtomicReference. I'm not sure you'll need this, but I figured I'd mention
 it as how you would use a PersistentHashMap in your situation.
 
 
   
 



signature.asc
Description: OpenPGP digital signature

Why is Storm Config not modifiable?

2015-04-23 Thread Matthias J. Sax

Hi,

in the Spout/Bolt open()/prepare() method, a Map containing the current
configuration is given. This Map is of type
clojure.lang.PersistentHashMap and calling .put(...) raises an
UnsupportedOperationException.

I was wondering, why a persistent map is used? I could image, that
system defined values should not be modified. But adding new key/value
pairs is not allowed either (and would be very helpful in my case).

Right now, I simply copy all values into a new HashMap object and add my
own values to this new HashMap. But this is an annoying workaround I
have to do each time...

Would it make sense, the change the type of the system provided Map (or
modify it's behavior to all for adding new key/value pairs)?


-Matthias






signature.asc
Description: OpenPGP digital signature

Re: Question about Nimbus$Client

2015-04-20 Thread Matthias J. Sax

Hi,

I had a quick look into it. JavaDoc can simply be added to the thrift
code and is copied into generated Java and Python source code files.

I modified storm.thrift and re-generated the code as described in
DEVELOPER.md:

cd storm-core/src
sh genthrift.sh

I pushed my changed to my git repo, in case you want to have a quick look:
https://github.com/mjsax/storm

I opened a JIRA, too:
https://issues.apache.org/jira/browse/STORM-792

If you like it, I can open a pull request. Of course, it would be nice
to add some more comments to all methods and interfaces.

-Matthias


On 04/20/2015 07:52 PM, Bobby Evans wrote:
 I totally agree there is a lot in the documentation that would be good to do. 
  For the generated code I'm not totally sure how to make javadocs work.  If 
 you want to file a JIRA for this we can look at it.
  - Bobby
  
 
 
  On Saturday, April 18, 2015 9:49 AM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 
  Thanks!
 
 It is a petty, that this information in not documented via JavaDoc...
 That would make life much easier and would save time for everybody (=
 no need to ask and answer stupid questions on the mailing list)
 
 -Matthias
 
 On 04/17/2015 06:13 PM, Bobby Evans wrote:
 getTopology returns the compiled topology after nimbus has gotten its hands 
 on it, so it has the ackers in it and the metrics consumers. getUserTopology 
 returns the topology as the user submitted it.
   - Bobby
   


   On Friday, April 17, 2015 4:24 AM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 

   Dear all,

 the class backtype.storm.generated.Nimbus defines a nested class
 Nimbus$Cluster that offers the following two methods (both defined in
 Nimbus#Iface):

 public StormTopology getTopology(String id) throws NotAliveException, 
 org.apache.thrift.TException; 
 public StormTopology getUserTopology(String id) throws 
 NotAliveException, org.apache.thrift.TException;

 What is the difference between both? I don't understand what the
 difference between a (regular?) topology and a user topology should
 be... From my understanding, there is only one type of topologies.


 Thanks for you help!


 -Matthias


   

 
 
   
 



signature.asc
Description: OpenPGP digital signature

[jira] [Created] (STORM-792) Missing documentation in backtype.storm.generated.Nimbus

2015-04-20 Thread Matthias J. Sax (JIRA)

Matthias J. Sax created STORM-792:
-

 Summary: Missing documentation in backtype.storm.generated.Nimbus
 Key: STORM-792
 URL: https://issues.apache.org/jira/browse/STORM-792
 Project: Apache Storm
  Issue Type: Documentation
Reporter: Matthias J. Sax
Priority: Minor


Explain the difference between Nimbus$Client.getTopology(String id) and 
Nimbus$Client.getUserTopology(String id)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

What is localNimbus?

2015-04-14 Thread Matthias J. Sax

Dear all,

I have a question about StormSubmitter class. It offers a method called
setLocalNimbus(Nimbus.Iface localNimbusHandler). It sets the variable
localNimbus and localNimbus is used within submitStormTopology.

Can you explain what localNimbus is? As far as I understand Storm, there
are two modes of operation: local and cluster. For local, a LocalCluster
is used the emulates a cluster. In the cluster case, FlinkClient (or
StormSubmitter that used FlinkClient internally) is used to submit a
topology to a (remotely) running Nimbus.

However, I cannot make sense out of a localNimbus.


Thanks for you help.

-Matthias











signature.asc
Description: OpenPGP digital signature

Re: What is localNimbus?

2015-04-14 Thread Matthias J. Sax

Thanks!

On 04/14/2015 06:28 PM, Bobby Evans wrote:
 LocalNimbus is something that is not really used.  It provides a way for the 
 StormSubmitter to be used when submitting a topology to a local mode cluster. 
  No one uses it and in the past it didn't work, not sure if it works now or 
 not.  It is probably best to just not use it unless you really need to 
 abstract out local mode from the topology code used to submit a topology.
  - Bobby
  
 
 
  On Tuesday, April 14, 2015 11:10 AM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 
  Dear all,
 
 I have a question about StormSubmitter class. It offers a method called
 setLocalNimbus(Nimbus.Iface localNimbusHandler). It sets the variable
 localNimbus and localNimbus is used within submitStormTopology.
 
 Can you explain what localNimbus is? As far as I understand Storm, there
 are two modes of operation: local and cluster. For local, a LocalCluster
 is used the emulates a cluster. In the cluster case, FlinkClient (or
 StormSubmitter that used FlinkClient internally) is used to submit a
 topology to a (remotely) running Nimbus.
 
 However, I cannot make sense out of a localNimbus.
 
 
 Thanks for you help.
 
 -Matthias
 
 
 
 
 
 
 
 
 
 
   
 



signature.asc
Description: OpenPGP digital signature

73 matches

Mail list logo