Not seeing supervisor process listening on TCP ports - only UDP port?

2016-04-29 Thread Joaquin Menchaca
I was wondering what ports I should open up with the AWS security group for
Apache Storm 0.10.0.

I opened up the typical ports: TCP 6700, 6701, 6702, 6703

But when when I check on the supervisor process, I only see it listening on
UDP 57944.

$ sudo netstat -tuap | grep $(ps ax | grep backtype.storm | grep -v grep |
awk '{ print $1}')
tcp0  0 ip-10-110-20-11.u:40999 ip-10-110-20-8.us-:2181
ESTABLISHED 10468/java
udp0  0 *:57944
*:* 10468/java


I did setup in my storm.yaml...

supervisor.childopts: "-Djava.net.preferIPv4Stack=true"
worker.childopts: "-Xmx768m -Djava.net.preferIPv4Stack=true"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703


-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: thread safe output collector

2016-04-29 Thread John Bush
I ran into this issue a few weeks ago, in a bolt, using Futures in Scala.
Basically the acks I was doing in new threads never got back (well at least
not all of them).  The solution I ended up with was to use a thread safe
queue and then flush the acks out from a tick (as was described earlier in
this thread).  It definitely works, I don't know if there is a better way.

My solution is documented here:
https://scalalala.wordpress.com/2016/04/09/async-stormy-weather/

On Fri, Apr 29, 2016 at 6:12 AM, Stephen Powis 
wrote:

> You're probably right, if its an expensive operation to package your data
> into a formatted tuple, it may make more sense for your spout to emit
> something simple, and have a downstream bolt package it up.
>
> In the situation I was describing our spout is executing a SQL statement
> to gather rows that should be emitted as tuples, so the "processing time"
> of the spout is more around how fast or slow that query statement ends up
> being, and less about converting them to tuples -- we're actually querying
> against somewhere around 100 different databases to find the data.  Doing
> that in a single thread with the other spouts seemed not ideal, so thats
> why we kicked it off to separate threads.
>
> On Fri, Apr 29, 2016 at 8:53 AM, Hart, James W.  wrote:
>
>> I’m working on a topology that will be similar to this application so I
>> was thinking about this yesterday.
>>
>>
>>
>> I’m thinking that if there is any significant work to do on messages in
>> making them into tuples, shouldn’t the message be emitted and the work be
>> in a bolt?  I don’t think that bolt execute functions have the same
>> limitations as spout nextTuple functions.  Now with that said, bolt
>> executes should not be long running computations either, but can be longer
>> than the spouts nextTuple function.
>>
>>
>>
>> *From:* Stephen Powis [mailto:spo...@salesforce.com]
>> *Sent:* Thursday, April 28, 2016 11:59 AM
>> *To:* user@storm.apache.org
>> *Subject:* Re: thread safe output collector
>>
>>
>>
>> So the Spout documentation (assuming its correct...) here (
>> http://storm.apache.org/releases/current/Concepts.html#spouts) mentions
>> this:
>>
>>
>> "The main method on spouts is nextTuple. nextTuple either emits a new
>> tuple into the topology or simply returns if there are no new tuples to
>> emit. *It is imperative that **nextTuple** does not block for any spout
>> implementation, because Storm calls all the spout methods on the same
>> thread.*"
>>
>> When developing a custom spout we interpreted it to mean that any "real
>> work" done by a spout should be done in a separate thread, and decided on
>> the following pattern which seems some what relevant to what you are trying
>> to do in your bolts.
>>
>> On Spout prepare, we create a concurrent/thread safe queue.  We then
>> create a new Thread passing it a reference to our thread safe queue.  This
>> thread handles finding new data that needs to be emitted.  When that thread
>> finds data, it adds it to the shared queue.  When the spout's nextTuple()
>> method is called, it looks for data on the shared queue and emits it.
>>
>> I imagine doing async processing in a bolt using one or more threads
>> could work with a similar pattern.  On prepare you setup your thread(s)
>> with references to a shared queue.  The bolt passes work to be completed to
>> the thread(s), the thread(s) communicate back to the bolt the result via a
>> shared queue.  Add in the concept of tick tuples to ensure your bolt checks
>> for completed work on a regular basis?
>>
>> Is there a better way to do this?
>>
>>
>>
>> On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche <
>> lists.digitalpeb...@gmail.com> wrote:
>>
>> Thanks for the clarification
>>
>>
>>
>> On 28 April 2016 at 15:12, P. Taylor Goetz  wrote:
>>
>> The documentation is wrong. See:
>>
>>
>>
>> https://issues.apache.org/jira/browse/STORM-841
>>
>>
>>
>> At some point it looks like the change made there got reverted. I will
>> reopen it to make sure the documentation is corrected.
>>
>>
>>
>> OutputCollector is NOT thread-safe.
>>
>>
>>
>> -Taylor
>>
>>
>>
>> On Apr 28, 2016, at 9:06 AM, Stephen Powis  wrote:
>>
>>
>>
>> "Its perfectly fine to launch new threads in bolts that do processing
>> asynchronously. OutputCollector
>> 
>> is thread-safe and can be called at any time."
>>
>>
>>
>> From the docs for 0.9.6:
>> http://storm.apache.org/releases/0.9.6/Concepts.html#bolts
>>
>>
>>
>> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz 
>> wrote:
>>
>> IIRC there was discussion about making it thread safe, but I don't
>> believe it was implemented.
>>
>>
>>
>> -Taylor
>>
>>
>> On Apr 28, 2016, at 3:52 AM, Julien Nioche 
>> wrote:
>>
>> Hi Stephen
>>
>>
>>
>> I asked the same question in February but did not get a reply
>>
>>
>>
>>
>> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3cca+-fm0urpf3fuerozywpzmxu-kdb

Storm Cluster Docs: nimbus.seeds vs. nimbus.host

2016-04-29 Thread Joaquin Menchaca
Hello,

I am following 0.10.0 docs (
http://storm.apache.org/releases/0.10.0/Setting-up-a-Storm-cluster.html).

I noticed 0.10.0 docs mention nimbus.seeds: ["111.222.333.44"], but I did
not see this mentioned in the defaults.yaml or nathanmarz/storm-deploy
referenced from the docs.

There was nimbus.host mentioned other online instructions document setting
up this this up in the storm.yaml.  It was listed in the default.yaml
. I did
not see this listed as something we should configure in the Storm 0.10.0
docs.

So, do we include nimbus.host or nimbus.seeds, or both?

Also, 0.10.0 docs also reference nathanmarz/storm-deploy repo, and in
there, this variable storm.supervisor.servers is defined in storm.clj
.
which is not listed in the default.  Do these need to be configured?

-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Apache Storm Docs - 0.10.0 says JDK6

2016-04-29 Thread Joaquin Menchaca
I noticed that the Apache Storm Docs online, and also in the READMEs still
says JDK6, when it should be JDK7 according to prior email thread.

   - https://github.com/apache/storm/tree/v0.10.0/examples/storm-starter
   - http://storm.apache.org/releases/0.10.0/Setting-up-a-Storm-cluster.html

-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: storm 0.10.0 ui showing anything

2016-04-29 Thread Erik Weathers
That's an HTTP 302 redirect which is saying that the resource is at a
different location.  Very standard HTTP behavior.  You need to follow the
redirect to the pointed to location.  So if you ran:  "curl
127.0.0.1:8080/index.html" it would have worked.  Alternatively you can ask
curl to follow the redirect for you automatically using -L.  So "curl -L
127.0.0.1:8080".   Alternatively, if you loaded the original URL in any
browser it would follow the redirect automatically as well.

- Erik

On Fri, Apr 29, 2016 at 8:10 PM, Joaquin Menchaca 
wrote:

> I get nothing with *curl 127.0.0.1:8080 *
>
> How could I troubleshoot further?
>
> HTTP/1.1 302 Found
> Date: Sat, 30 Apr 2016 03:10:17 GMT
> Location: /index.html
> Content-Length: 0
> Server: Jetty(7.x.y-SNAPSHOT)
>
>
>
> --
>
> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>


storm 0.10.0 ui showing anything

2016-04-29 Thread Joaquin Menchaca
I get nothing with *curl 127.0.0.1:8080 *

How could I troubleshoot further?

HTTP/1.1 302 Found
Date: Sat, 30 Apr 2016 03:10:17 GMT
Location: /index.html
Content-Length: 0
Server: Jetty(7.x.y-SNAPSHOT)



-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


v1.0.1 issues -- Re: docs on storm starter frustrating, but success

2016-04-29 Thread Henry Hottelet
Hello P. Taylor Goetz,


I downloaded : https://github.com/apache/storm/releases/tag/v1.0.1 


Then :

 mvn clean install 

Yielded these results: 

[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 06:00 min
[INFO] Finished at: 2016-04-29T22:22:27-04:00
[INFO] Final Memory: 13M/245M
[INFO] 
[ERROR] Failed to execute goal on project storm-starter: Could not resolve 
dependencies for project org.apache.storm:storm-starter:jar:1.0.1: The 
following artifacts could not be resolved: 
org.apache.storm:storm-core:jar:1.0.1, 
org.apache.storm:storm-core:jar:tests:1.0.1, 
org.apache.storm:multilang-javascript:jar:1.0.1, 
org.apache.storm:multilang-ruby:jar:1.0.1, 
org.apache.storm:multilang-python:jar:1.0.1, 
org.apache.storm:storm-metrics:jar:1.0.1, 
org.apache.storm:storm-kafka:jar:1.0.1, org.apache.storm:storm-hdfs:jar:1.0.1, 
org.apache.storm:storm-hbase:jar:1.0.1, org.apache.storm:storm-redis:jar:1.0.1: 
Could not find artifact org.apache.storm:storm-core:jar:1.0.1 in central 
(http://repo1.maven.org/maven2/) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException

Can you please explain why these dependencies are missing, can this be fixed in 
the pom.xml file?

Also, do you have any experience with the Hortonworks variant of Storm:
http://hortonworks.com/apache/storm/#tutorials 

I want to get a small starter example running that teaches me how storm works, 
for a small starter project.

Do you have a recommended book to read?  I saw Mannings Storm Applied book, let 
me know what you recommend that explains the architecture, and some mechanics 
on how it works, and how to get my first program started.

— 
Henry Hottelet
646-543-6104 (Google Voice: cell/sms)
hotte...@gmail.com  (Email)
http://www.linkedin.com/in/hottelet  
(LinkedIn)
https://technologyventureslimited.appointlet.com 
 (Schedule appointment)

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by Henry Hottelet
as applicable, for any loss or damage arising in any way from its use.
If you received this transmission in error, please immediately contact
the sender and destroy the material in its entirety, whether in
electronic or hard copy format. Thank you.

> On Apr 29, 2016, at 1:01 PM, Joaquin Menchaca  wrote:
> 
> apache-storm-0.10.0.tar.gz



Cannot launch Supervisor, missing unknown file

2016-04-29 Thread Joaquin Menchaca
And it says missing file. This is from the 0.10.0 tarball.

I tried a
*bin/storm supervisor &*
$ Traceback (most recent call last):
  File "bin/storm.py", line 568, in 
main()
  File "bin/storm.py", line 565, in main
(COMMANDS.get(COMMAND, unknown_command))(*ARGS)
  File "bin/storm.py", line 377, in supervisor
jvmopts = parse_args(confvalue("supervisor.childopts", cppaths)) + [
  File "bin/storm.py", line 137, in confvalue
p = sub.Popen(command, stdout=sub.PIPE)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

[1]+  Exit 1  bin/storm supervisor


-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: docs on storm starter frustrating, but success

2016-04-29 Thread Joaquin Menchaca
The Clojure libraries stopped at 0.9.0.1, and they are no longer following
the Storm project.  There's the Apache libraries.  But last I look they
references to no have references to 0.10.1.  If you download the tarball,
it should work with mvn clean install, just if cloned from github.

On Fri, Apr 29, 2016 at 4:11 PM, Henry Hottelet  wrote:

> Joaquin, and storm users,
>
> Please let me know if you were able to get the maven build for the storm
> starter project built.  If so, please give me the link to the github
> project, because I tried yesterday, it was missing dependencies for clojure.
>
> Also, which docs are you referring to for getting started?
>
> It is a struggle because of the missing dependencies to clojure.
>
> I am also considering Kafka integrated with storm. However I need it to be
> locally developed on a macbook, and deployable to a red hat machine where
> on aws or azure.
>
> Please advise.
>
> Henry
> On Apr 29, 2016 1:01 PM, "Joaquin Menchaca"  wrote:
>
> I looked through the all of the docs, and not one tells explicitly how to
> run the sample topologies.  Unless there's a buried link that's 2 or more
> links down somewhere that has the missing information.  The documentation
> that comes with the code, README.md from the storm starter examples, is
> sparse, and what it does document does not actually work.
>
> I had to unzip the jar see where the classes were actually bundled, then
> with that knowledge, I could refer to the jar's namespace/package path in
> the command line, which does not start with org.apache.
>
> From the online docs, I haven't found any docs that cover simply running
> the topologies.  It has high level docs on how to create topologies
> oneself.   As an Ops guy, I just wanted to run some sample topologies, as I
> am trying to build a multi-cluster Apache Storm for them.
>
>
> On Thu, Apr 28, 2016 at 12:31 AM, Abhishek Agarwal 
> wrote:
>
>> Documentation for storm 0.10 is available here -
>> http://storm.apache.org/releases/0.10.0/index.html
>>
>> On Wed, Apr 27, 2016 at 4:11 AM, Joaquin Menchaca 
>> wrote:
>>
>>> Ignore the docs.  I looked at the final packaged jar from mvn clean
>>> install, noticed the path was different.  I was able to at least get
>>> ExlamationTopology to work.
>>>
>>> tar -xvf apache-storm-0.10.0.tar.gz
>>> cd  apache-storm-0.10.0/examples/storm-starter
>>> mvn clean install -DskipTests=true
>>> storm jar storm-starter-topologies-0.10.0.jar 
>>> storm.starter.ExclamationTopology
>>>
>>> Are they any other docs on the other topologies, they look interesting.
>>> Or are the docs just in the code?
>>>
>>> --
>>>
>>> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>>>
>>
>>
>>
>> --
>> Regards,
>> Abhishek Agarwal
>>
>>
>
>
> --
>
> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>
>


-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: docs on storm starter frustrating, but success

2016-04-29 Thread Henry Hottelet
Joaquin, and storm users,

Please let me know if you were able to get the maven build for the storm
starter project built.  If so, please give me the link to the github
project, because I tried yesterday, it was missing dependencies for clojure.

Also, which docs are you referring to for getting started?

It is a struggle because of the missing dependencies to clojure.

I am also considering Kafka integrated with storm. However I need it to be
locally developed on a macbook, and deployable to a red hat machine where
on aws or azure.

Please advise.

Henry
On Apr 29, 2016 1:01 PM, "Joaquin Menchaca"  wrote:

I looked through the all of the docs, and not one tells explicitly how to
run the sample topologies.  Unless there's a buried link that's 2 or more
links down somewhere that has the missing information.  The documentation
that comes with the code, README.md from the storm starter examples, is
sparse, and what it does document does not actually work.

I had to unzip the jar see where the classes were actually bundled, then
with that knowledge, I could refer to the jar's namespace/package path in
the command line, which does not start with org.apache.

>From the online docs, I haven't found any docs that cover simply running
the topologies.  It has high level docs on how to create topologies
oneself.   As an Ops guy, I just wanted to run some sample topologies, as I
am trying to build a multi-cluster Apache Storm for them.


On Thu, Apr 28, 2016 at 12:31 AM, Abhishek Agarwal 
wrote:

> Documentation for storm 0.10 is available here -
> http://storm.apache.org/releases/0.10.0/index.html
>
> On Wed, Apr 27, 2016 at 4:11 AM, Joaquin Menchaca 
> wrote:
>
>> Ignore the docs.  I looked at the final packaged jar from mvn clean
>> install, noticed the path was different.  I was able to at least get
>> ExlamationTopology to work.
>>
>> tar -xvf apache-storm-0.10.0.tar.gz
>> cd  apache-storm-0.10.0/examples/storm-starter
>> mvn clean install -DskipTests=true
>> storm jar storm-starter-topologies-0.10.0.jar 
>> storm.starter.ExclamationTopology
>>
>> Are they any other docs on the other topologies, they look interesting.
>> Or are the docs just in the code?
>>
>> --
>>
>> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>>
>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>


-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: docs on storm starter frustrating, but success

2016-04-29 Thread Abhishek Agarwal
I see. The package names have been changed but the link in the github code
was not updated. Thanks for highlighting this issue. will fix it up.

On Fri, Apr 29, 2016 at 10:31 PM, Joaquin Menchaca 
wrote:

> I looked through the all of the docs, and not one tells explicitly how to
> run the sample topologies.  Unless there's a buried link that's 2 or more
> links down somewhere that has the missing information.  The documentation
> that comes with the code, README.md from the storm starter examples, is
> sparse, and what it does document does not actually work.
>
> I had to unzip the jar see where the classes were actually bundled, then
> with that knowledge, I could refer to the jar's namespace/package path in
> the command line, which does not start with org.apache.
>
> From the online docs, I haven't found any docs that cover simply running
> the topologies.  It has high level docs on how to create topologies
> oneself.   As an Ops guy, I just wanted to run some sample topologies, as I
> am trying to build a multi-cluster Apache Storm for them.
>
>
> On Thu, Apr 28, 2016 at 12:31 AM, Abhishek Agarwal 
> wrote:
>
>> Documentation for storm 0.10 is available here -
>> http://storm.apache.org/releases/0.10.0/index.html
>>
>> On Wed, Apr 27, 2016 at 4:11 AM, Joaquin Menchaca 
>> wrote:
>>
>>> Ignore the docs.  I looked at the final packaged jar from mvn clean
>>> install, noticed the path was different.  I was able to at least get
>>> ExlamationTopology to work.
>>>
>>> tar -xvf apache-storm-0.10.0.tar.gz
>>> cd  apache-storm-0.10.0/examples/storm-starter
>>> mvn clean install -DskipTests=true
>>> storm jar storm-starter-topologies-0.10.0.jar 
>>> storm.starter.ExclamationTopology
>>>
>>> Are they any other docs on the other topologies, they look interesting.
>>> Or are the docs just in the code?
>>>
>>> --
>>>
>>> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>>>
>>
>>
>>
>> --
>> Regards,
>> Abhishek Agarwal
>>
>>
>
>
> --
>
> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>



-- 
Regards,
Abhishek Agarwal


Re: UI problem

2016-04-29 Thread Abhishek Agarwal
can you try to open the UI in incognito mode of your browser?

On Fri, Apr 29, 2016 at 12:24 PM, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> Hi,
>
> Looks like this error is due to browser.I will check once.
>
> Thank you
>
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>
> On Fri, Apr 29, 2016 at 11:26 AM, Jungtaek Lim  wrote:
>
>> Hi,
>>
>> Could you open developer tools from your browser and check any API calls
>> are failing with UI page load?
>>
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2016년 4월 29일 (금) 오후 2:47, Sai Dilip Reddy Kiralam <
>> dkira...@aadhya-analytics.com>님이 작성:
>>
>>>
>>> Hello,
>>>
>>> I Installed storm 0.10.0 in aws (ubuntu) instance.Nimbus,supervisor,ui
>>> services are running fine.I'm able to run Topologies also but UI is not
>>> showing any summary .I don't know where I'm going wrong.
>>>
>>> Below I attached the my ui,nimbus and ui log screen shorts.
>>>
>>>
>>>
>>>
>>> *Best regards,*
>>>
>>> *K.Sai Dilip Reddy.*
>>>
>>
>


-- 
Regards,
Abhishek Agarwal


Re: Distribute load across multiple storm clusters.

2016-04-29 Thread Darsh
Hi Patrick,

Sorry I think I confused you with my setup. I have separate nimbus for each
storm cluster.

Setup I used for testing,

3 node zk cluster with one node in 3 availability zones.

storm cluster 1=nimbus and 3 supervisor in availability zone 1

storm cluster 2=nimbus and 3 supervisor in availability zone 2



Darsh


On Thu, Apr 28, 2016 at 10:10 PM, Patrick.Brinton <
patrick.brin...@target.com> wrote:

> Darsh,
> I am in a bit of a crush for a deployment and perf testing but I will ask
> my experts to take a look tomorrow.  I will set up a single nimbus and 3
> zookeepers in one data center, then I will distribute the supervisors and
> see what happens.  I think we will always be limited in how we back up
> nimbus but I think we should be able to share processing for supervisors
> across data centers.
>
> Keep in touch and let me know if this is really what you are looking to
> test.  I have a lot of toys at the moment and this seems like a worth while
> test for everyone.
>
> Patrick
>
> Patrick Brinton Sr. Engineer SWLM | ¤Target | 7000 Target Parkway North
> | Brooklyn Park, MN  55445 | 612.599.6523* (ph) *|
> patrick.brin...@target.com 
>
> From: Darsh 
> Reply-To: "user@storm.apache.org" 
> Date: Thursday, April 28, 2016 at 5:29 PM
> To: "user@storm.apache.org" 
> Subject: Re: Distribute load across multiple storm clusters.
> Patrick,
>
> Thank you for replying. I did try but load isn't distributed. Both
> clusters are processing all the events individually on the topic.
>
>
> Darsh
>
> On Thu, Apr 28, 2016 at 2:56 PM, Patrick.Brinton <
> patrick.brin...@target.com> wrote:
>
>> Darsh,
>> I have never tried but I have a setup where I could try.  As long as you
>> point to the same zooKeeper I think it would work.  Give it a try and let
>> me know if you hit issues.  If you do I will mimic your setup and we should
>> be able to figure it out.
>>
>> Patrick
>>
>> Patrick Brinton Sr. Engineer SWLM | ¤Target | 7000 Target Parkway North
>> | Brooklyn Park, MN  55445 | 612.599.6523* (ph) *|
>> patrick.brin...@target.com 
>>
>> From: Darsh 
>> Reply-To: "user@storm.apache.org" 
>> Date: Thursday, April 28, 2016 at 12:39 PM
>> To: "user@storm.apache.org" 
>> Subject: Distribute load across multiple storm clusters.
>>
>> Hi,
>>
>>
>> Is it possible to to distribute load across 2 clusters if I deploy same
>> topology to 2 storm clusters with same spout id?
>>
>>
>> We have 2 separate storm clusters with 0.10.0 version of storm running in
>> 2 different availability zones. We are using storm-kafka spout (with simple
>> consumer) to process data from kafka and external common zookeeper to store
>> the kafka offsets.Kafka Topic has  32 partitions. 8  executors(
>> *parallelism_hint*) for spout in each cluster.
>>
>>
>>
>>
>>
>>
>> Thanks
>>
>> Darsh
>>
>
>
>
> --
> Thanks
>
> Darsh
>



-- 
Thanks

Darsh


Unable to merge DRPC Spout with any other spout using Trident

2016-04-29 Thread Bharat Jayaraman Karthick
Hi,

We have a use case which requires a bolt consuming streams emitted by Kafka
spout & a DRPC spout. I used TridentTopology and tried to merge the streaam
but got error message "Cannot join DRPC stream with streams originating
from other spouts".

To check i used TopologyBuilder to merge these two streams and was able to
merge tuples / group the tuples emitted by these two streams.

Can you help me understand why TridentTopology throws the error message
when we try to merge DRPC stream with any other streaam emitted by any
spout.

For Trident, i used the following topology

TridentTopology topology = new TridentTopology();
SkuUpdatesKafkaEmulateSpout spout = new SkuUpdatesKafkaEmulateSpout(10);
Stream kafkaStream = topology.newStream("kafka_stream", spout);
Stream drpcStream = topology.newDRPCStream("drpc_stream", drpc)
.each(new Fields("args"), new DRPC_ArgsSplit(), new
Fields("sku", "new_value"));
Stream merged = topology.merge(kafkaStream, drpcStream);
merged.persistentAggregate(new MemoryMapState.Factory(), new
Fields("sku"), new Sum(), new Fields("value"))
.parallelismHint(2);
return topology.build();

For TopologyBuilder, i used the following topology

TopologyBuilder builder = new TopologyBuilder();

Config conf = new Config();
conf.setDebug(true);
LocalDRPC drpc = new LocalDRPC();
DRPCSpout spout1 = new DRPCSpout("processOrder", drpc);
DemoSpout spout2 = new DemoSpout();

builder.setSpout("drpc", spout1);
builder.setSpout("demospout", spout2);
builder.setBolt("scraperBolt", new
XMLScrapperBolt()).shuffleGrouping("drpc").shuffleGrouping("demospout");
builder.setBolt("returnBolt", new
DRPCBolt()).shuffleGrouping("scraperBolt", "drpc-stream");
builder.setBolt("return", new
ReturnResults()).shuffleGrouping("returnBolt");

Regards,
Bharat Karthick J


Re: docs on storm starter frustrating, but success

2016-04-29 Thread Joaquin Menchaca
I looked through the all of the docs, and not one tells explicitly how to
run the sample topologies.  Unless there's a buried link that's 2 or more
links down somewhere that has the missing information.  The documentation
that comes with the code, README.md from the storm starter examples, is
sparse, and what it does document does not actually work.

I had to unzip the jar see where the classes were actually bundled, then
with that knowledge, I could refer to the jar's namespace/package path in
the command line, which does not start with org.apache.

>From the online docs, I haven't found any docs that cover simply running
the topologies.  It has high level docs on how to create topologies
oneself.   As an Ops guy, I just wanted to run some sample topologies, as I
am trying to build a multi-cluster Apache Storm for them.


On Thu, Apr 28, 2016 at 12:31 AM, Abhishek Agarwal 
wrote:

> Documentation for storm 0.10 is available here -
> http://storm.apache.org/releases/0.10.0/index.html
>
> On Wed, Apr 27, 2016 at 4:11 AM, Joaquin Menchaca 
> wrote:
>
>> Ignore the docs.  I looked at the final packaged jar from mvn clean
>> install, noticed the path was different.  I was able to at least get
>> ExlamationTopology to work.
>>
>> tar -xvf apache-storm-0.10.0.tar.gz
>> cd  apache-storm-0.10.0/examples/storm-starter
>> mvn clean install -DskipTests=true
>> storm jar storm-starter-topologies-0.10.0.jar 
>> storm.starter.ExclamationTopology
>>
>> Are they any other docs on the other topologies, they look interesting.
>> Or are the docs just in the code?
>>
>> --
>>
>> 是故勝兵先勝而後求戰,敗兵先戰而後求勝。
>>
>
>
>
> --
> Regards,
> Abhishek Agarwal
>
>


-- 

是故勝兵先勝而後求戰,敗兵先戰而後求勝。


Re: thread safe output collector

2016-04-29 Thread Stephen Powis
You're probably right, if its an expensive operation to package your data
into a formatted tuple, it may make more sense for your spout to emit
something simple, and have a downstream bolt package it up.

In the situation I was describing our spout is executing a SQL statement to
gather rows that should be emitted as tuples, so the "processing time" of
the spout is more around how fast or slow that query statement ends up
being, and less about converting them to tuples -- we're actually querying
against somewhere around 100 different databases to find the data.  Doing
that in a single thread with the other spouts seemed not ideal, so thats
why we kicked it off to separate threads.

On Fri, Apr 29, 2016 at 8:53 AM, Hart, James W.  wrote:

> I’m working on a topology that will be similar to this application so I
> was thinking about this yesterday.
>
>
>
> I’m thinking that if there is any significant work to do on messages in
> making them into tuples, shouldn’t the message be emitted and the work be
> in a bolt?  I don’t think that bolt execute functions have the same
> limitations as spout nextTuple functions.  Now with that said, bolt
> executes should not be long running computations either, but can be longer
> than the spouts nextTuple function.
>
>
>
> *From:* Stephen Powis [mailto:spo...@salesforce.com]
> *Sent:* Thursday, April 28, 2016 11:59 AM
> *To:* user@storm.apache.org
> *Subject:* Re: thread safe output collector
>
>
>
> So the Spout documentation (assuming its correct...) here (
> http://storm.apache.org/releases/current/Concepts.html#spouts) mentions
> this:
>
>
> "The main method on spouts is nextTuple. nextTuple either emits a new
> tuple into the topology or simply returns if there are no new tuples to
> emit. *It is imperative that **nextTuple** does not block for any spout
> implementation, because Storm calls all the spout methods on the same
> thread.*"
>
> When developing a custom spout we interpreted it to mean that any "real
> work" done by a spout should be done in a separate thread, and decided on
> the following pattern which seems some what relevant to what you are trying
> to do in your bolts.
>
> On Spout prepare, we create a concurrent/thread safe queue.  We then
> create a new Thread passing it a reference to our thread safe queue.  This
> thread handles finding new data that needs to be emitted.  When that thread
> finds data, it adds it to the shared queue.  When the spout's nextTuple()
> method is called, it looks for data on the shared queue and emits it.
>
> I imagine doing async processing in a bolt using one or more threads could
> work with a similar pattern.  On prepare you setup your thread(s) with
> references to a shared queue.  The bolt passes work to be completed to the
> thread(s), the thread(s) communicate back to the bolt the result via a
> shared queue.  Add in the concept of tick tuples to ensure your bolt checks
> for completed work on a regular basis?
>
> Is there a better way to do this?
>
>
>
> On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche <
> lists.digitalpeb...@gmail.com> wrote:
>
> Thanks for the clarification
>
>
>
> On 28 April 2016 at 15:12, P. Taylor Goetz  wrote:
>
> The documentation is wrong. See:
>
>
>
> https://issues.apache.org/jira/browse/STORM-841
>
>
>
> At some point it looks like the change made there got reverted. I will
> reopen it to make sure the documentation is corrected.
>
>
>
> OutputCollector is NOT thread-safe.
>
>
>
> -Taylor
>
>
>
> On Apr 28, 2016, at 9:06 AM, Stephen Powis  wrote:
>
>
>
> "Its perfectly fine to launch new threads in bolts that do processing
> asynchronously. OutputCollector
> 
> is thread-safe and can be called at any time."
>
>
>
> From the docs for 0.9.6:
> http://storm.apache.org/releases/0.9.6/Concepts.html#bolts
>
>
>
> On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz 
> wrote:
>
> IIRC there was discussion about making it thread safe, but I don't believe
> it was implemented.
>
>
>
> -Taylor
>
>
> On Apr 28, 2016, at 3:52 AM, Julien Nioche 
> wrote:
>
> Hi Stephen
>
>
>
> I asked the same question in February but did not get a reply
>
>
>
>
> https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3cca+-fm0urpf3fuerozywpzmxu-kdbgf-zj3wbyr8evsaqjc6...@mail.gmail.com%3E
>
>
>
> Anyone who could confirm this?
>
>
>
> Thanks
>
>
>
> On 27 April 2016 at 14:05, Steven Lewis  wrote:
>
> I have conflicting information, and have not checked personally but has
> the output collector finally been made thread safe for emitting in version
> 1.0 or 0.10? I know it was a huge problem in 0.9.5 when trying to do
> threading in a bolt for async future calls and emitting once it returns.
>
>
>
> This email and any files transmitted with it are confidential and intended
> solely for the individual or entity to whom they are addressed. If you have
> received this email in error destroy it immediately. *** Walmart

RE: thread safe output collector

2016-04-29 Thread Hart, James W.
I’m working on a topology that will be similar to this application so I was 
thinking about this yesterday.

I’m thinking that if there is any significant work to do on messages in making 
them into tuples, shouldn’t the message be emitted and the work be in a bolt?  
I don’t think that bolt execute functions have the same limitations as spout 
nextTuple functions.  Now with that said, bolt executes should not be long 
running computations either, but can be longer than the spouts nextTuple 
function.

From: Stephen Powis [mailto:spo...@salesforce.com]
Sent: Thursday, April 28, 2016 11:59 AM
To: user@storm.apache.org
Subject: Re: thread safe output collector

So the Spout documentation (assuming its correct...) here 
(http://storm.apache.org/releases/current/Concepts.html#spouts) mentions this:

"The main method on spouts is nextTuple. nextTuple either emits a new tuple 
into the topology or simply returns if there are no new tuples to emit. It is 
imperative that nextTuple does not block for any spout implementation, because 
Storm calls all the spout methods on the same thread."
When developing a custom spout we interpreted it to mean that any "real work" 
done by a spout should be done in a separate thread, and decided on the 
following pattern which seems some what relevant to what you are trying to do 
in your bolts.
On Spout prepare, we create a concurrent/thread safe queue.  We then create a 
new Thread passing it a reference to our thread safe queue.  This thread 
handles finding new data that needs to be emitted.  When that thread finds 
data, it adds it to the shared queue.  When the spout's nextTuple() method is 
called, it looks for data on the shared queue and emits it.
I imagine doing async processing in a bolt using one or more threads could work 
with a similar pattern.  On prepare you setup your thread(s) with references to 
a shared queue.  The bolt passes work to be completed to the thread(s), the 
thread(s) communicate back to the bolt the result via a shared queue.  Add in 
the concept of tick tuples to ensure your bolt checks for completed work on a 
regular basis?
Is there a better way to do this?

On Thu, Apr 28, 2016 at 11:22 AM, Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>> wrote:
Thanks for the clarification

On 28 April 2016 at 15:12, P. Taylor Goetz 
mailto:ptgo...@gmail.com>> wrote:
The documentation is wrong. See:

https://issues.apache.org/jira/browse/STORM-841

At some point it looks like the change made there got reverted. I will reopen 
it to make sure the documentation is corrected.

OutputCollector is NOT thread-safe.

-Taylor

On Apr 28, 2016, at 9:06 AM, Stephen Powis 
mailto:spo...@salesforce.com>> wrote:


"Its perfectly fine to launch new threads in bolts that do processing 
asynchronously. 
OutputCollector
 is thread-safe and can be called at any time."



From the docs for 0.9.6: 
http://storm.apache.org/releases/0.9.6/Concepts.html#bolts

On Thu, Apr 28, 2016 at 9:03 AM, P. Taylor Goetz 
mailto:ptgo...@gmail.com>> wrote:
IIRC there was discussion about making it thread safe, but I don't believe it 
was implemented.

-Taylor

On Apr 28, 2016, at 3:52 AM, Julien Nioche 
mailto:lists.digitalpeb...@gmail.com>> wrote:
Hi Stephen

I asked the same question in February but did not get a reply

https://mail-archives.apache.org/mod_mbox/storm-user/201602.mbox/%3cca+-fm0urpf3fuerozywpzmxu-kdbgf-zj3wbyr8evsaqjc6...@mail.gmail.com%3E

Anyone who could confirm this?

Thanks

On 27 April 2016 at 14:05, Steven Lewis 
mailto:steven.le...@walmart.com>> wrote:
I have conflicting information, and have not checked personally but has the 
output collector finally been made thread safe for emitting in version 1.0 or 
0.10? I know it was a huge problem in 0.9.5 when trying to do threading in a 
bolt for async future calls and emitting once it returns.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***



--

Open Source Solutions for Text Engineering

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble





--

Open Source Solutions for Text Engineering

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble



Re: id for bolt tasks in Storm UI

2016-04-29 Thread Yury Ruchin
Hi there Serega,

What you see in the UI are probably compound executor "ID"s. They are
actually ranges of task IDs assigned to respective executors. For example,
[26-27] means executor with tasks 26 and 27 assigned to it. Task can
determine its ID via TopologyContext.getThisTaskId() inside the component
code. On the MetricsConsumer side, each DataPoint contains srcTaskId field.
Those can be used further to match task-provided data against executor IDs.
To avoid parsing executor ID strings, you may want to use Nimbus Thrift API
to obtain ExecutorInfo structures that already have task_start and task_end
as separate fields.

Regards,
Yury

2016-04-26 22:55 GMT+03:00 Serega Sheypak :

> Hi, there is an id for each task displayed in UI. Id values are: [26-27]
> or [44-45]. I want to publish application-specific metrics to Influx, I
> want to publish the same id in metric name, so I can match basic Storm
> metrics with my app metrics and find bottlenecks/skews/e.t.c
>
> What API shoud I use to get the same combnation of ids?
>