State of the Clojure DSL

2014-06-05 Thread Rafik NACCACHE
Hello Community,

I am intending to use storm, and am very interested by the Clojure DSL, way
cleaner and more concise.

However, I notice that the project (Clojure DSL) has stalled quite a bit
(doc since December 2011, Branches not referencing last 0.9.1 version...)

Is it safe that I go for using storm with the Clojure DSL or is this DSL
not seeing much effort or being even discontinued ?

Thank you very much ,

Rafik


Re: Negative ids while emitting messages

2014-06-05 Thread adiya n
Should have looked at the wiki/links better. Troubleshooting guide helped.  
thanks!




On Wednesday, June 4, 2014 11:29 PM, adiya n  wrote:
 


Hello all, 


What do negative id values mean in the logs?  Does that indicate any error?
I got these while running a local topology.  Also, is there any good 
documentation about understanding the logs. Most of it is self-explanatory from 
the log messages but if there is any existing documentation somewhere, that 
will really help.


5433 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing received 
message source: prepare-request:7, stream: default, id: 
{-4212877050522192524=6354306792350206104}, [3323165896464350833, 33]
Tuple received by bolt is source: prepare-request:7, stream: default, id: 
{-4212877050522192524=6354306792350206104}, [3323165896464350833, 33]
5433 [Thread-10] INFO  backtype.storm.daemon.worker  - Worker 
445b9038-ac6f-4629-a86f-82cd5eecf0e3 for storm drpc-demo-1-1401943429 on 
9b5a127d-1a02-4362-883e-f5193c6f7a92:4 has finished loading
5434 [Thread-36] INFO  backtype.storm.daemon.executor  - Processing received 
message source: spout:8, stream: __ack_init, id: {}, [-4212877050522192524 
-4291234877071288404 8]
5434 [Thread-19] INFO  backtype.storm.daemon.task  - Emitting: bolt0 default 
[3323165896464350833,  MyBolt is emitting: ]
5434 [Thread-36] INFO  backtype.storm.daemon.executor  - Processing received 
message source: prepare-request:7, stream: __ack_ack, id: {}, 
[-4212877050522192524 340043085440436800]
5435 [Thread-19]
 INFO  backtype.storm.daemon.task  - Emitting: bolt0 __ack_ack 
[-4212877050522192524 5906040704921991109]


thanks
Aiya

Multilang shell bolts exceptions

2014-06-05 Thread adiya n
If there is a failure in my php code in one of the shell bolts exec process, 
the error isn't obvious  in all the cases when the topology comes up /handles 
the request. 


Only in certain types of failures I see my exceptions getting printed but for 
some, it just doesnt do anything. 


here is a simple example: 

this is part of my code in my shell_bolt.php  under multilang/resources:

try {
 .
 .

  $fh = fopen("/tmp/pytrace.txt", 'a') or die("Can't open file");
  fwrite($fh, "hello world");
  fwrite($fh,print_r($tuple->values));

  fclose($fh);
  $this->emit(array($tuple->values[0], $word));
  } catch (Exception $e) {
    throw new Exception("exception " . $e);
  }

the problem area is print_r() function. If I remove print_r() and just print 
$tuple->values, everything works fine and the shell bolt emits the word.  With 
print_r, it just hangs and my drpc calls timeout .

What is the best way to figure out what is going on with the shell bolt exec 
process??


thanks!!



On , adiya n  wrote:
 


Should have looked at the wiki/links better. Troubleshooting guide helped.  
thanks!



On Wednesday, June 4, 2014 11:29 PM, adiya n  wrote:
 


Hello all, 


What do negative id values mean in the logs?  Does that indicate any error?
I got these while running a local topology.  Also, is there any good 
documentation about understanding the logs. Most of it is self-explanatory from 
the log messages but if there is any existing documentation somewhere, that 
will really help.


5433 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing received 
message source: prepare-request:7, stream: default, id: 
{-4212877050522192524=6354306792350206104}, [3323165896464350833, 33]
Tuple received by bolt is source: prepare-request:7, stream: default, id:
 {-4212877050522192524=6354306792350206104}, [3323165896464350833, 33]
5433 [Thread-10] INFO  backtype.storm.daemon.worker  - Worker 
445b9038-ac6f-4629-a86f-82cd5eecf0e3 for storm drpc-demo-1-1401943429 on 
9b5a127d-1a02-4362-883e-f5193c6f7a92:4 has finished loading
5434 [Thread-36] INFO  backtype.storm.daemon.executor  - Processing received 
message source: spout:8, stream: __ack_init, id: {}, [-4212877050522192524 
-4291234877071288404 8]
5434 [Thread-19] INFO  backtype.storm.daemon.task  - Emitting: bolt0 default 
[3323165896464350833,  MyBolt is emitting: ]
5434 [Thread-36] INFO  backtype.storm.daemon.executor  - Processing received 
message source: prepare-request:7, stream: __ack_ack, id: {}, 
[-4212877050522192524 340043085440436800]
5435 [Thread-19]
 INFO  backtype.storm.daemon.task  - Emitting: bolt0 __ack_ack 
[-4212877050522192524 5906040704921991109]


thanks
Aiya

Re: Kafka-storm - trident transactional topology - bad perf

2014-06-05 Thread Romain Leroux
Has anyone ever faced similar or related issue ?


2014-06-03 19:33 GMT+09:00 Romain Leroux :

> I have a simple trident transactional topology that does something like
> the following:
>
> kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6)
> -->
> aggregate with reducerAggregator (paraHint=20) -->
> transactional state (I tried MemoryMapState, MemcachedMapState and
> CassandraMapState) -->
> new Stream -->
> print new values
>
> I tried to tune the topology by firstly setting maxSpoutPending=1 and
> batchEmitIntervals to a large value (1 sec), and then iteratively improve
> those values.
> I ended up with maxSpoutPending=20 batchEmitInterval=150ms
>
> However I observed 2 things
>
> 1/ Delay in the topology keeps increasing
> Even with those "fine-tune" values, or smaller values, it seems that some
> transactions fail and that trident replay them (transactional state).
> However this replaying process seems to delay the processing of new
> incoming data, and storm seems to never catch up after replaying.
> The result is that after a few minutes processing is clearly not "real
> time" anymore (the aggregate printed in the logs are those from a few min
> before, and it increases); even though I don't meet a particular bottleneck
> for the calculation (bolt capacity and latency are ok).
> Is this behavior normal ? Does it come from KafkaTransactionalSpout ? From
> trident transactional mechanism ?
>
> 2/ There is an unavoidable bottleneck on $spoutcoord-spout0
> Because small failures keeps accumulating, tridents replay more and more
> transactions.
> "spout0" performances are impacted (more work), but this can be scaled
> with more kafka partitions.
> However $spoutcoord-spout0 is always a unique thread in trident, whatever
> spout we provide, and I clearly observed that $spoutcoord-spout0 goes above
> 1 after some minutes (and latency is above 10 sec or something).
> Is there a way to improve this ? Or is this an unavoidable consequence of
> trident's transactional logic that can't be addressed ?
>
>


RE: Cluster Failure and Data Recovery

2014-06-05 Thread Nima Movafaghrad
Thanks Srinath. We are already using the reliable message processing for bolts 
failure etc. My problem is with a catastrophic cases. For example,  what 
happens if the entire cluster goes down or what if the Topology fully fails. At 
the moment we are reading from MQ and although keeping the transactions open 
would resolve our data loss prevention issue it isn’t quiet feasible. Some of 
our bolts listen and batch for up to 30 seconds so they have big enough batches 
that can be committed to RDBMS. Keeping the transactions open for that long 
slows things down considerably.

 

So I guess to  frame to question better I should ask, if there a way to persist 
the intermediate data?

 

Thanks,

Nima

 

From: Srinath C [mailto:srinat...@gmail.com] 
Sent: Wednesday, June 04, 2014 5:49 PM
To: user
Subject: Re: Cluster Failure and Data Recovery

 

Hi Nima,

    Use the HYPERLINK 
"https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing"reliable
 message processing mechanism to ensure that there is no data loss. You would 
need support for transactional semantics from the tuple source where spout can 
commit/abort a read (kestrel, kafka, rabbitmq, etc can do this). Yes you would 
need to keep the queue transactions open until the spout receives an "ack" or 
"fail" for every tuple.

    IMO, this ensures that each tuple is processed "atleast once" and not 
"exactly once" so you need to be prepared to end up with duplicate entries in 
your DB or have a way to figure out that a write to DB is duplicate or earlier 
write. This is case where there are crashes with intermediate data in memory.

 

Regards,

Srinath.

 

 

On Thu, Jun 5, 2014 at 5:37 AM, Nima Movafaghrad mailto:nima.movafagh...@oracle.com"nima.movafagh...@oracle.com> wrote:

Hi everyone,

 

We are in the process of designing a high available system with zero data loss 
tolerance. Plan is for the spouts to read from a queue and process them down in 
several different specialized bolts and then flush to DB. How can we guarantee 
no data loss here? Should we keep the queue transactions open until data is 
committed to DB? Should we persist the state of all the bolts? What happens to 
the intermediate data if the whole cluster fails?

 

Any suggestions are much appreciated.

 

Nima

 


Re: Kafka-storm - trident transactional topology - bad perf

2014-06-05 Thread Danijel Schiavuzzi
Have you tried increasing the tuple timeout? The default of 30 seconds may
not suit you.

On Thursday, June 5, 2014, Romain Leroux  wrote:

> Has anyone ever faced similar or related issue ?
>
>
> 2014-06-03 19:33 GMT+09:00 Romain Leroux  >:
>
>> I have a simple trident transactional topology that does something like
>> the following:
>>
>> kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6)
>> -->
>> aggregate with reducerAggregator (paraHint=20) -->
>> transactional state (I tried MemoryMapState, MemcachedMapState and
>> CassandraMapState) -->
>> new Stream -->
>> print new values
>>
>> I tried to tune the topology by firstly setting maxSpoutPending=1 and
>> batchEmitIntervals to a large value (1 sec), and then iteratively improve
>> those values.
>> I ended up with maxSpoutPending=20 batchEmitInterval=150ms
>>
>> However I observed 2 things
>>
>> 1/ Delay in the topology keeps increasing
>> Even with those "fine-tune" values, or smaller values, it seems that some
>> transactions fail and that trident replay them (transactional state).
>> However this replaying process seems to delay the processing of new
>> incoming data, and storm seems to never catch up after replaying.
>> The result is that after a few minutes processing is clearly not "real
>> time" anymore (the aggregate printed in the logs are those from a few min
>> before, and it increases); even though I don't meet a particular bottleneck
>> for the calculation (bolt capacity and latency are ok).
>> Is this behavior normal ? Does it come from KafkaTransactionalSpout ?
>> From trident transactional mechanism ?
>>
>> 2/ There is an unavoidable bottleneck on $spoutcoord-spout0
>> Because small failures keeps accumulating, tridents replay more and more
>> transactions.
>> "spout0" performances are impacted (more work), but this can be scaled
>> with more kafka partitions.
>> However $spoutcoord-spout0 is always a unique thread in trident, whatever
>> spout we provide, and I clearly observed that $spoutcoord-spout0 goes above
>> 1 after some minutes (and latency is above 10 sec or something).
>> Is there a way to improve this ? Or is this an unavoidable consequence of
>> trident's transactional logic that can't be addressed ?
>>
>>
>

-- 
Danijel Schiavuzzi

E: dani...@schiavuzzi.com
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7


RE: Workers constantly restarted due to session timeout

2014-06-05 Thread Michael Dev
Thanks again for all the advice.

Turns out the issue was related to GC. The long pauses in uptime were directly 
correlated to full GC executions pausing all threads. We had written a spout 
that was caching input tuples locally waiting for the cluster to invoke 
ISpout.nextTuple(). The topology just wasn't operating fast enough so that 
spout's local cache would grow unbounded resulting in hitting very high memory 
usage (up to whatever xmx was, 30, 60, 100gb) thus resulting in painfully long 
"stop-the"world" GC executions. We were falling short of being able to handle 
the spout's activity by around 100-200k tuples every 30 seconds.

Unfortunately this indicates that we'll need to reevaluate inefficient sections 
in our topology barring more hardware but at least the issue as been identified.

Thanks,
Michael

> Date: Tue, 3 Jun 2014 14:06:38 -0500
> From: der...@yahoo-inc.com
> To: user@storm.incubator.apache.org
> Subject: Re: Workers constantly restarted due to session timeout
> 
> > 1) Is it appropriate to run Zookeeper in parallel on the same node with the 
> > storm services?
> 
> I recommend separate, and even then to ZK storage to a path on its own disk 
> device if possible.  ZK is a bottleneck for storm, and when it is too slow 
> lots of bad things can happen.
> 
> Some folks use shared hosts (with or without VMs) in which to run ZK.  In 
> those situations, VMs or processes owned by other users doing unrelated 
> things can put load on the disk, and that will dramatically slow down ZK.
> 
> 
> > 2) We have zookeeper 3.4.5 installed. I see Storm uses zookeeper-3.3.3 as 
> > its client. Should we downgrade our installation?
> 
> I am not sure about that, since we've been running with ZK 3.4.5 in storm 
> (and on the server).  It might work very well, but I have not tried it.  I do 
> not remember if anyone on this list has identified any issues with this 3.3.3 
> + 3.4.5 combo.
> 
> 
> One setting we changed to dramatically improve performance with ZK was 
> setting the system property '-Dzookeeper.forceSync=no' on the server.
> 
> Normally, ZK will sync to disk on every write, and that causes two seeks: one 
> for the data and one for the data log.  It gets really expensive with all of 
> the workers heartbeating in through ZK.  Be warned that with only on ZK 
> server, an outage could leave you in an inconsistent state.
> 
> You might check to see if the ZK server is keeping up.  There are tools like 
> iotop that can give information about disk load.
> 
> -- 
> Derek
> 
> On 6/3/14, 13:14, Michael Dev wrote:
> >
> >
> >
> > Thank you Derek for the explanation between :disallowed and :timed-out. 
> > That was extremely helpful in understanding what decisions Storm is making. 
> > I increased the timeouts for both messages to 5 minutes and returned the 
> > zookeeper session timeouts to their default values. This made it plain to 
> > see periods in time where the "Uptime" column for the busiest component's 
> > Worker would not update (1-2 minutes, potentially never resulting in a 
> > worker restart).
> >
> > ZK logs report constant disconnects and reconnects while the "Uptime" is 
> > not updating:
> > 16:28:30,440 - INFO NIOServerCnxn@1001 - Closed socket connection for 
> > client /10.49.21.151:54004 which has sessionid 0x1464f1fddc1018f
> > 16:31:18,364 - INFO NIOServerCnxnFactory@197 - Accepted socket connection 
> > from /10.49.21.151:34419
> > 16.31:18,365 - WARN ZookeeperServer@793 - Connection request from old 
> > client /10.49.21.151:34419; will be dropped if server is in r-o mode
> > 16:31:18,365 - INFO ZookeeperServer@832 - Client attempting to renew 
> > session 0x264f1fddc4021e at /10.49.21.151:34419
> > 16:31:18,365 - INFO Learner@107 - Revalidating client: 0x264f1fddc4021e
> > 16:31:18,366 - INFO ZooKeeperServer@588 - Invalid session 0x264f1fddc4021e 
> > for client /10.49.21.151:34419, probably expired
> > 16:31:18,366 - NIOServerCnxn@1001 - Closed socket connection for client 
> > /10.49.21.151:34419 which had sessionid 0x264f1fddc4021e
> > 16:31:18,378 - INFO NIOServerCnxnFactory@197 - Accepted socket connection 
> > from  /10.49.21.151:34420
> > 16:31:18,391 - WARN ZookeeperServer@793 - Connection request from old
> > client /10.49.21.151:34420; will be dropped if server is in r-o mode
> > 16:31:18,392 - INFO ZookeeperServer@839 - Client attempting to establish 
> > new session at /10.49.21.151:34420
> > 16:31:18,394 - INFO ZookeeperServer@595 - Established session 
> > 0x1464fafddc10218 with negotiated timeout 2 for client 
> > /10.49.21.151:34420
> > 16.31.44,002 - INFO NIOServerCnxn@1001 - Closed socket connection for 
> > /10.49.21.151:34420 which had sessionid 0x1464fafddc10218
> > 16.32.48,055 - INFO NIOServerCxnFactory@197 - Accepted socket connection 
> > from /10.49.21.151:34432
> > 16:32:48,056 - WARN ZookeeperServer@793 - Connection request from old
> > client /10.49.21.151:34432; will be dropped if server is in r-o mode
> > 16.32.48,056 - INFO Zookeeper

Re: MultiLag (Python) bolt gives error

2014-06-05 Thread Andrew Montalenti
On the worker machine, do you have Python installed? You can check by
running "python -V". You need to ensure you're using the same $PATH as
whatever environment is running your Storm supervisor/worker. From the
exception stack trace, it looks like your ShellBolt does not see a python
interpreter on the $PATH, therefore it can't run your Python bolt.

At least, that's how I read "Caused by: java.io.IOException: Cannot run
program "python" (in directory
"/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02
-1-1401786854/resources"): error=2, No such file or directory".


On Wed, Jun 4, 2014 at 7:33 PM, Hamza Asad  wrote:

> Help required plz.. I'm facing this issue while using pytjon bolt..
> Haven't resolved it yet.. Anyone having solution
>
> *** This message has been sent using QMobile A500 ***
>
>
> Hamza Asad  wrote:
>
> I have checked that resources folder is NOT placed in location
>
> */tmp/6a090639-b975-42b8-8bc1-8de6093ad3e1/supervisor/stormdist/mongo_20140528_02-1-1401881161/resources*
> There are only two files i.e stormcode.ser  stormconf.ser but NO resources
> folder. Why? How can i resolve this issue. Im using *storm*
> *0.9.1-incubating* and compiling code using netbeans.
>
>
> On Tue, Jun 3, 2014 at 2:31 PM, Hamza Asad  wrote:
>
>> Hi,
>> I'm using python bolt which is in the resource directory but storm giving
>> me error
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *7226 [Thread-19-exclaim1] INFO  backtype.storm.daemon.executor -
>> Preparing bolt exclaim1:(3) 7231 [Thread-10] INFO
>> backtype.storm.daemon.executor - Loading executor exclaim1:[4 4]7232
>> [Thread-19-exclaim1] ERROR backtype.storm.util - Async loop
>> died!java.lang.RuntimeException: Error when launching multilang subprocess
>> at backtype.storm.task.ShellBolt.prepare(ShellBolt.java:105)
>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>> backtype.storm.daemon.executor$eval5170$fn__5171$fn__5183.invoke(executor.clj:689)
>> ~[na:na] at backtype.storm.util$async_loop$fn__390.invoke(util.clj:431)
>> ~[na:na]at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_55]Caused by:
>> java.io.IOException: Cannot run program "python" (in directory
>> "/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02-1-1401786854/resources"):
>> error=2, No such file or directory at
>> java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) ~[na:1.7.0_55]
>> at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:50)
>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>> backtype.storm.task.ShellBolt.prepare(ShellBolt.java:102)
>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] ... 4 common frames
>> omittedCaused by: java.io.IOException: error=2, No such file or
>> directoryat java.lang.UNIXProcess.forkAndExec(Native Method)
>> ~[na:1.7.0_55]at java.lang.UNIXProcess.(UNIXProcess.java:135)
>> ~[na:1.7.0_55] at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>> ~[na:1.7.0_55]at
>> java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ~[na:1.7.0_55]
>> ... 6 common frames omitted7233 [Thread-10] INFO
>> backtype.storm.daemon.task - Emitting: exclaim1 __system ["startup"] 7233
>> [Thread-10] INFO  backtype.storm.daemon.executor - Loaded executor tasks
>> exclaim1:[4 4]7233 [Thread-19-exclaim1] ERROR
>> backtype.storm.daemon.executor - java.lang.RuntimeException: Error when
>> launching multilang subprocess at
>> backtype.storm.task.ShellBolt.prepare(ShellBolt.java:105)
>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>> backtype.storm.daemon.executor$eval5170$fn__5171$fn__5183.invoke(executor.clj:689)
>> ~[na:na] at backtype.storm.util$async_loop$fn__390.invoke(util.clj:431)
>> ~[na:na]at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_55]Caused by:
>> java.io.IOException: Cannot run program "python" (in directory
>> "/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02-1-1401786854/resources"):
>> error=2, No such file or directory at
>> java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) ~[na:1.7.0_55]
>> at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:50)
>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>> backtype.storm.task.ShellBolt.prepare(ShellBolt.java:102)
>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] ... 4 common frames
>> omittedCaused by: java.io.IOException: error=2, No such file or
>> directoryat java.lang.UNIXProcess.forkAndExec(Native Method)
>> ~[na:1.7.0_55]at java.lang.UNIXProcess.(UNIXProcess.java:135)
>> ~[na:1.7.0_55] at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>> ~[na:1.7.0_55]at
>> java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ~[na:1.7.0_55]   *
>>
>


Re: Multilang shell bolts exceptions

2014-06-05 Thread Andrew Montalenti
What multi-lang driver for PHP are you using? There is a way for multi-lang
drivers to report errors up to Storm (see ShellBolt's handling of the
"error" command here
).
Perhaps your driver is not making use of this facility and that's why your
exceptions are being buried?


On Thu, Jun 5, 2014 at 11:46 AM, adiya n  wrote:

> If there is a failure in my php code in one of the shell bolts exec
> process, the error isn't obvious  in all the cases when the topology comes
> up /handles the request.
>
> Only in certain types of failures I see my exceptions getting printed but
> for some, it just doesnt do anything.
>
> here is a simple example:
> this is part of my code in my shell_bolt.php  under multilang/resources:
>
> try {
>  .
>  .
>   $fh = fopen("/tmp/pytrace.txt", 'a') or die("Can't open file");
>   fwrite($fh, "hello world");
>   fwrite($fh, print_r($tuple->values));
>   fclose($fh);
>   $this->emit(array($tuple->values[0], $word));
>   } catch (Exception $e) {
> throw new Exception("exception " . $e);
>   }
>
> the problem area is print_r() function. If I remove print_r() and just
> print $tuple->values, everything works fine and the shell bolt emits the
> word.  With print_r, it just hangs and my drpc calls timeout .
>
> What is the best way to figure out what is going on with the shell bolt
> exec process??
>
>
> thanks!!
>
>
>   On , adiya n  wrote:
>
>
> Should have looked at the wiki/links better. Troubleshooting guide
> helped.  thanks!
>
>
>   On Wednesday, June 4, 2014 11:29 PM, adiya n 
> wrote:
>
>
> Hello all,
>
> What do negative id values mean in the logs?  Does that indicate any error?
> I got these while running a local topology.  Also, is there any good
> documentation about understanding the logs. Most of it is self-explanatory
> from the log messages but if there is any existing documentation somewhere,
> that will really help.
>
> 5433 [Thread-19] INFO  backtype.storm.daemon.executor  - Processing
> received message source: prepare-request:7, stream: default, id: {-
> 4212877050522192524=6354306792350206104}, [3323165896464350833, 33]
> Tuple received by bolt is source: prepare-request:7, stream: default, id:
> {-4212877050522192524=6354306792350206104}, [3323165896464350833, 33]
> 5433 [Thread-10] INFO  backtype.storm.daemon.worker  - Worker
> 445b9038-ac6f-4629-a86f-82cd5eecf0e3 for storm drpc-demo-1-1401943429 on
> 9b5a127d-1a02-4362-883e-f5193c6f7a92:4 has finished loading
> 5434 [Thread-36] INFO  backtype.storm.daemon.executor  - Processing
> received message source: spout:8, stream: __ack_init, id: {},
> [-4212877050522192524 -4291234877071288404 8]
> 5434 [Thread-19] INFO  backtype.storm.daemon.task  - Emitting: bolt0
> default [3323165896464350833,  MyBolt is emitting: ]
> 5434 [Thread-36] INFO  backtype.storm.daemon.executor  - Processing
> received message source: prepare-request:7, stream: __ack_ack, id: {}, [-
> 4212877050522192524 340043085440436800]
> 5435 [Thread-19] INFO  backtype.storm.daemon.task  - Emitting: bolt0
> __ack_ack [-4212877050522192524 5906040704921991109]
>
> thanks
> Aiya
>
>
>
>
>
>


Re: python via java+scriptengine+jython

2014-06-05 Thread Andrew Montalenti
Don't see why this wouldn't be possible. I assume a ScriptEngine
implementation would be using jython, so the usual caveats about jython
apply (e.g. you don't have access to all 3rd-party libraries and you tend
to be behind in Python versions). What kind of comparison are you looking
for, a performance-oriented one? It's bound to be faster, I suppose,
because it can avoid multi-lang serialization/deserialization, although
jython isn't exactly widely-renowned for its interpreter speed.


On Wed, Jun 4, 2014 at 11:36 AM, Tyson Norris  wrote:

> Hi -
> I've seen several people attempting to run python bolts via multilang, and
> wondered if anyone has tried and can compare with running some python
> functions via java ScriptEngine? That is uses a normal java bolt that calls
> python functions via ScriptEngine.
>
> We are experimenting with this approach (script engine), but don't have
> anything useful built yet.
>
> Thanks
> Tyson


Re: Cluster Failure and Data Recovery

2014-06-05 Thread Andrew Montalenti
Sounds like you might benefit from considering something like Kafka instead
of a standard MQ. We have some notes about this

publicly online from our PyData talk on Kafka/Storm. You can configure
Kafka to have an SLA on data that is in terms of data size or time; if your
entire topology crashes or goes down, then you can resume messages at the
spout from the moment the failure happened, and pay no penalty.

(Of course, then you need to figure out how to guarantee your Kafka plant
is always online, but this is do-able given its distributed architecture.)

This doesn't sound like a problem that Storm should think about solving --
after all, if your entire Storm cluster fails, all of the high availability
guarantees of each component are, by definition, out the window.


On Thu, Jun 5, 2014 at 2:08 PM, Nima Movafaghrad <
nima.movafagh...@oracle.com> wrote:

> Thanks Srinath. We are already using the reliable message processing for
> bolts failure etc. My problem is with a catastrophic cases. For example,
>  what happens if the entire cluster goes down or what if the Topology fully
> fails. At the moment we are reading from MQ and although keeping the
> transactions open would resolve our data loss prevention issue it isn't
> quiet feasible. Some of our bolts listen and batch for up to 30 seconds so
> they have big enough batches that can be committed to RDBMS. Keeping the
> transactions open for that long slows things down considerably.
>
>
>
> So I guess to  frame to question better I should ask, if there a way to
> persist the intermediate data?
>
>
>
> Thanks,
>
> Nima
>
>
>
> *From:* Srinath C [mailto:srinat...@gmail.com]
> *Sent:* Wednesday, June 04, 2014 5:49 PM
> *To:* user
> *Subject:* Re: Cluster Failure and Data Recovery
>
>
>
> Hi Nima,
>
> Use the reliable message processing
>  
> mechanism
> to ensure that there is no data loss. You would need support for
> transactional semantics from the tuple source where spout can commit/abort
> a read (kestrel, kafka, rabbitmq, etc can do this). Yes you would need to
> keep the queue transactions open until the spout receives an "ack" or
> "fail" for every tuple.
>
> IMO, this ensures that each tuple is processed "atleast once" and not
> "exactly once" so you need to be prepared to end up with duplicate entries
> in your DB or have a way to figure out that a write to DB is duplicate or
> earlier write. This is case where there are crashes with intermediate data
> in memory.
>
>
>
> Regards,
>
> Srinath.
>
>
>
>
>
> On Thu, Jun 5, 2014 at 5:37 AM, Nima Movafaghrad <
> nima.movafagh...@oracle.com> wrote:
>
> Hi everyone,
>
>
>
> We are in the process of designing a high available system with zero data
> loss tolerance. Plan is for the spouts to read from a queue and process
> them down in several different specialized bolts and then flush to DB. How
> can we guarantee no data loss here? Should we keep the queue transactions
> open until data is committed to DB? Should we persist the state of all the
> bolts? What happens to the intermediate data if the whole cluster fails?
>
>
>
> Any suggestions are much appreciated.
>
>
>
> Nima
>
>
>


RE: Cluster Failure and Data Recovery

2014-06-05 Thread Nima Movafaghrad
Thanks Andrew. J

 

From: Andrew Montalenti [mailto:and...@parsely.com] 
Sent: Thursday, June 05, 2014 3:30 PM
To: user@storm.incubator.apache.org
Subject: Re: Cluster Failure and Data Recovery

 

Sounds like you might benefit from considering something like Kafka instead of 
a standard MQ. We have HYPERLINK 
"http://www.parsely.com/slides/logs/notes/#introducing-apache-kafka"some notes 
about this publicly online from our PyData talk on Kafka/Storm. You can 
configure Kafka to have an SLA on data that is in terms of data size or time; 
if your entire topology crashes or goes down, then you can resume messages at 
the spout from the moment the failure happened, and pay no penalty.

 

(Of course, then you need to figure out how to guarantee your Kafka plant is 
always online, but this is do-able given its distributed architecture.)

 

This doesn't sound like a problem that Storm should think about solving -- 
after all, if your entire Storm cluster fails, all of the high availability 
guarantees of each component are, by definition, out the window. 

 

On Thu, Jun 5, 2014 at 2:08 PM, Nima Movafaghrad mailto:nima.movafagh...@oracle.com"nima.movafagh...@oracle.com> wrote:

Thanks Srinath. We are already using the reliable message processing for bolts 
failure etc. My problem is with a catastrophic cases. For example,  what 
happens if the entire cluster goes down or what if the Topology fully fails. At 
the moment we are reading from MQ and although keeping the transactions open 
would resolve our data loss prevention issue it isn't quiet feasible. Some of 
our bolts listen and batch for up to 30 seconds so they have big enough batches 
that can be committed to RDBMS. Keeping the transactions open for that long 
slows things down considerably.

 

So I guess to  frame to question better I should ask, if there a way to persist 
the intermediate data?

 

Thanks,

Nima

 

From: Srinath C [mailto:HYPERLINK 
"mailto:srinat...@gmail.com"srinat...@gmail.com] 
Sent: Wednesday, June 04, 2014 5:49 PM
To: user
Subject: Re: Cluster Failure and Data Recovery

 

Hi Nima,

Use the HYPERLINK 
"https://github.com/nathanmarz/storm/wiki/Guaranteeing-message-processing"reliable
 message processing mechanism to ensure that there is no data loss. You would 
need support for transactional semantics from the tuple source where spout can 
commit/abort a read (kestrel, kafka, rabbitmq, etc can do this). Yes you would 
need to keep the queue transactions open until the spout receives an "ack" or 
"fail" for every tuple.

IMO, this ensures that each tuple is processed "atleast once" and not 
"exactly once" so you need to be prepared to end up with duplicate entries in 
your DB or have a way to figure out that a write to DB is duplicate or earlier 
write. This is case where there are crashes with intermediate data in memory.

 

Regards,

Srinath.

 

 

On Thu, Jun 5, 2014 at 5:37 AM, Nima Movafaghrad mailto:nima.movafagh...@oracle.com"nima.movafagh...@oracle.com> wrote:

Hi everyone,

 

We are in the process of designing a high available system with zero data loss 
tolerance. Plan is for the spouts to read from a queue and process them down in 
several different specialized bolts and then flush to DB. How can we guarantee 
no data loss here? Should we keep the queue transactions open until data is 
committed to DB? Should we persist the state of all the bolts? What happens to 
the intermediate data if the whole cluster fails?

 

Any suggestions are much appreciated.

 

Nima

 

 


Re: python via java+scriptengine+jython

2014-06-05 Thread Tyson Norris
Yes, just wondering if anyone has compared performance.
I have been sidetracked and have not completed our prototype yet, but was 
curious if anyone else had pursued with good or bad results, aside from jython 
compatibility issues etc.

Thanks
Tyson

On Jun 5, 2014, at 3:25 PM, Andrew Montalenti 
mailto:and...@parsely.com>> wrote:

Don't see why this wouldn't be possible. I assume a ScriptEngine implementation 
would be using jython, so the usual caveats about jython apply (e.g. you don't 
have access to all 3rd-party libraries and you tend to be behind in Python 
versions). What kind of comparison are you looking for, a performance-oriented 
one? It's bound to be faster, I suppose, because it can avoid multi-lang 
serialization/deserialization, although jython isn't exactly widely-renowned 
for its interpreter speed.


On Wed, Jun 4, 2014 at 11:36 AM, Tyson Norris 
mailto:tnor...@adobe.com>> wrote:
Hi -
I’ve seen several people attempting to run python bolts via multilang, and 
wondered if anyone has tried and can compare with running some python functions 
via java ScriptEngine? That is uses a normal java bolt that calls python 
functions via ScriptEngine.

We are experimenting with this approach (script engine), but don’t have 
anything useful built yet.

Thanks
Tyson




Re: MultiLag (Python) bolt gives error

2014-06-05 Thread Hamza Asad
Thnx for reply.. Yes i have installed python on worker node. and i have checked 
at that specific directory too.. I guess the issue is that storm looking for 
resources folder in desired path but its not there. Why? How n where should i 
give its path ? Assigning which variable? 
 
 *** This message has been sent using QMobile A500 ***

Andrew Montalenti  wrote:

>On the worker machine, do you have Python installed? You can check by running 
>"python -V". You need to ensure you're using the same $PATH as whatever 
>environment is running your Storm supervisor/worker. From the exception stack 
>trace, it looks like your ShellBolt does not see a python interpreter on the 
>$PATH, therefore it can't run your Python bolt.
>
>
>At least, that's how I read "Caused by: java.io.IOException: Cannot run 
>program "python" (in directory 
>"/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02-1-1401786854/resources"):
> error=2, No such file or directory".
>
>
>
>On Wed, Jun 4, 2014 at 7:33 PM, Hamza Asad  wrote:
>
>Help required plz.. I'm facing this issue while using pytjon bolt.. Haven't 
>resolved it yet.. Anyone having solution
>
>*** This message has been sent using QMobile A500 ***
>
>
>
>Hamza Asad  wrote:
>
>I have checked that resources folder is NOT placed in location 
>/tmp/6a090639-b975-42b8-8bc1-8de6093ad3e1/supervisor/stormdist/mongo_20140528_02-1-1401881161/resources
>
>There are only two files i.e stormcode.ser  stormconf.ser but NO resources 
>folder. Why? How can i resolve this issue. Im using storm 0.9.1-incubating and 
>compiling code using netbeans.
>
>
>
>On Tue, Jun 3, 2014 at 2:31 PM, Hamza Asad  wrote:
>
>Hi,
>
>I'm using python bolt which is in the resource directory but storm giving me 
>error 
>
>7226 [Thread-19-exclaim1] INFO  backtype.storm.daemon.executor - Preparing 
>bolt exclaim1:(3)
>7231 [Thread-10] INFO  backtype.storm.daemon.executor - Loading executor 
>exclaim1:[4 4]
>7232 [Thread-19-exclaim1] ERROR backtype.storm.util - Async loop died!
>java.lang.RuntimeException: Error when launching multilang subprocess
>
>    at backtype.storm.task.ShellBolt.prepare(ShellBolt.java:105) 
>~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>    at 
>backtype.storm.daemon.executor$eval5170$fn__5171$fn__5183.invoke(executor.clj:689)
> ~[na:na]
>    at backtype.storm.util$async_loop$fn__390.invoke(util.clj:431) ~[na:na]
>    at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>    at java.lang.Thread.run(Thread.java:744) [na:1.7.0_55]
>Caused by: java.io.IOException: Cannot run program "python" (in directory 
>"/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02-1-1401786854/resources"):
> error=2, No such file or directory
>    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) ~[na:1.7.0_55]
>    at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:50) 
>~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>    at backtype.storm.task.ShellBolt.prepare(ShellBolt.java:102) 
>~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>    ... 4 common frames omitted
>Caused by: java.io.IOException: error=2, No such file or directory
>    at java.lang.UNIXProcess.forkAndExec(Native Method) ~[na:1.7.0_55]
>    at java.lang.UNIXProcess.(UNIXProcess.java:135) ~[na:1.7.0_55]
>    at java.lang.ProcessImpl.start(ProcessImpl.java:130) ~[na:1.7.0_55]
>    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ~[na:1.7.0_55]
>    ... 6 common frames omitted
>7233 [Thread-10] INFO  backtype.storm.daemon.task - Emitting: exclaim1 
>__system ["startup"]
>7233 [Thread-10] INFO  backtype.storm.daemon.executor - Loaded executor tasks 
>exclaim1:[4 4]
>7233 [Thread-19-exclaim1] ERROR backtype.storm.daemon.executor - 
>java.lang.RuntimeException: Error when launching multilang subprocess
>
>    at backtype.storm.task.ShellBolt.prepare(ShellBolt.java:105) 
>~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]
>    at 
>backtype.storm.daemon.executor$eval5170$fn__5171$fn__5183.invoke(executor.clj:689)
>


Re: MultiLag (Python) bolt gives error

2014-06-05 Thread Andrew Montalenti
I suppose the other possibility is that you don't have a "resources/"
directory on your CLASSPATH at compile-time, when you end up building your
JAR for submission to the Storm cluster. You could verify this by
inspecting the JAR you are about to submit to the nimbus. Do something like
this:

mkdir foo
mv foo.jar foo/
cd foo
unzip foo.jar
ls resources/

If you have a resources/ directory, then your JAR is being packaged up
correctly. If you don't, that explains it.

This is described in the "Packaging your stuff" section in the multi-lang
protocol

docs.


On Thu, Jun 5, 2014 at 6:41 PM, Hamza Asad  wrote:

> Thnx for reply.. Yes i have installed python on worker node. and i have
> checked at that specific directory too.. I guess the issue is that storm
> looking for resources folder in desired path but its not there. Why? How n
> where should i give its path ? Assigning which variable?
>
> *** This message has been sent using QMobile A500 ***
>
> Andrew Montalenti  wrote:
>
> On the worker machine, do you have Python installed? You can check by
> running "python -V". You need to ensure you're using the same $PATH as
> whatever environment is running your Storm supervisor/worker. From the
> exception stack trace, it looks like your ShellBolt does not see a python
> interpreter on the $PATH, therefore it can't run your Python bolt.
>
> At least, that's how I read "Caused by: java.io.IOException: Cannot run
> program "python" (in directory
> "/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02
> -1-1401786854/resources"): error=2, No such file or directory".
>
>
> On Wed, Jun 4, 2014 at 7:33 PM, Hamza Asad  wrote:
>
>> Help required plz.. I'm facing this issue while using pytjon bolt..
>> Haven't resolved it yet.. Anyone having solution
>>
>> *** This message has been sent using QMobile A500 ***
>>
>>
>> Hamza Asad  wrote:
>>
>> I have checked that resources folder is NOT placed in location
>>
>> */tmp/6a090639-b975-42b8-8bc1-8de6093ad3e1/supervisor/stormdist/mongo_20140528_02-1-1401881161/resources*
>> There are only two files i.e stormcode.ser  stormconf.ser but NO
>> resources folder. Why? How can i resolve this issue. Im using *storm*
>> *0.9.1-incubating* and compiling code using netbeans.
>>
>>
>> On Tue, Jun 3, 2014 at 2:31 PM, Hamza Asad 
>> wrote:
>>
>>> Hi,
>>> I'm using python bolt which is in the resource directory but storm
>>> giving me error
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *7226 [Thread-19-exclaim1] INFO  backtype.storm.daemon.executor -
>>> Preparing bolt exclaim1:(3) 7231 [Thread-10] INFO
>>> backtype.storm.daemon.executor - Loading executor exclaim1:[4 4]7232
>>> [Thread-19-exclaim1] ERROR backtype.storm.util - Async loop
>>> died!java.lang.RuntimeException: Error when launching multilang subprocess
>>> at backtype.storm.task.ShellBolt.prepare(ShellBolt.java:105)
>>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>>> backtype.storm.daemon.executor$eval5170$fn__5171$fn__5183.invoke(executor.clj:689)
>>> ~[na:na] at backtype.storm.util$async_loop$fn__390.invoke(util.clj:431)
>>> ~[na:na]at clojure.lang.AFn.run(AFn.java:24) [clojure-1.4.0.jar:na]
>>> at java.lang.Thread.run(Thread.java:744) [na:1.7.0_55]Caused by:
>>> java.io.IOException: Cannot run program "python" (in directory
>>> "/tmp/5f3c8318-14da-4999-a974-584a9d200fdb/supervisor/stormdist/mongo_20140528_02-1-1401786854/resources"):
>>> error=2, No such file or directory at
>>> java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) ~[na:1.7.0_55]
>>> at backtype.storm.utils.ShellProcess.launch(ShellProcess.java:50)
>>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>>> backtype.storm.task.ShellBolt.prepare(ShellBolt.java:102)
>>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating] ... 4 common frames
>>> omittedCaused by: java.io.IOException: error=2, No such file or
>>> directoryat java.lang.UNIXProcess.forkAndExec(Native Method)
>>> ~[na:1.7.0_55]at java.lang.UNIXProcess.(UNIXProcess.java:135)
>>> ~[na:1.7.0_55] at java.lang.ProcessImpl.start(ProcessImpl.java:130)
>>> ~[na:1.7.0_55]at
>>> java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ~[na:1.7.0_55]
>>> ... 6 common frames omitted7233 [Thread-10] INFO
>>> backtype.storm.daemon.task - Emitting: exclaim1 __system ["startup"] 7233
>>> [Thread-10] INFO  backtype.storm.daemon.executor - Loaded executor tasks
>>> exclaim1:[4 4]7233 [Thread-19-exclaim1] ERROR
>>> backtype.storm.daemon.executor - java.lang.RuntimeException: Error when
>>> launching multilang subprocess at
>>> backtype.storm.task.ShellBolt.prepare(ShellBolt.java:105)
>>> ~[storm-core-0.9.1-incubating.jar:0.9.1-incubating]at
>>> backtype.storm.daemon.executor$eval5170$fn__5171$fn__5183.invoke(executor.clj:689)*
>>>
>>


Kafka-Storm Run-time Exception

2014-06-05 Thread Abhishek Bhattacharjee
I am using kafka with storm. I am using maven to build my topology and I am
using scala 2.9.2 same as I am using kafka_2.9.2_0.8.1.

Topology build perfectly using maven. But hwn I submit the topology to
storm I get the following Exception:

java.lang.NoSuchMethodError: scala.Predef$.int2Integer(I)Ljava/lang/Integer;
at kafka.api.OffsetRequest$.(OffsetRequest.scala:28)
at kafka.api.OffsetRequest$.(OffsetRequest.scala)
at kafka.consumer.ConsumerConfig$.(ConsumerConfig.scala:36)
at kafka.consumer.ConsumerConfig$.(ConsumerConfig.scala)
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:35)
at kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:35)
at 
storm.ubiquitous.spouts.KafkaConsumer.fetchdata(KafkaConsumer.java:41)
at 
storm.ubiquitous.spouts.KafkaSpoutTransaction$KafkaPartitionedEmitter.emitPartitionBatchNew(KafkaSpoutTransaction.java:92)
at 
storm.ubiquitous.spouts.KafkaSpoutTransaction$KafkaPartitionedEmitter.emitPartitionBatchNew(KafkaSpoutTransaction.java:68)
at 
backtype.storm.transactional.partitioned.PartitionedTransactionalSpoutExecutor$Emitter$1.init(PartitionedTransactionalSpoutExecutor.java:77)
at 
backtype.storm.transactional.state.RotatingTransactionalState.getState(RotatingTransactionalState.java:82)
at 
backtype.storm.transactional.state.RotatingTransactionalState.getStateOrCreate(RotatingTransactionalState.java:103)
at 
backtype.storm.transactional.partitioned.PartitionedTransactionalSpoutExecutor$Emitter.emitBatch(PartitionedTransactionalSpoutExecutor.java:73)
at 
backtype.storm.transactional.partitioned.PartitionedTransactionalSpoutExecutor$Emitter.emitBatch(PartitionedTransactionalSpoutExecutor.java:50)
at 
backtype.storm.transactional.TransactionalSpoutBatchExecutor.execute(TransactionalSpoutBatchExecutor.java:48)
at 
backtype.storm.coordination.CoordinatedBolt.execute(CoordinatedBolt.java:308)
at 
backtype.storm.daemon.executor$fn__3498$tuple_action_fn__3500.invoke(executor.clj:615)
at 
backtype.storm.daemon.executor$mk_task_receiver$fn__3421.invoke(executor.clj:383)
at 
backtype.storm.disruptor$clojure_handler$reify__2962.onEvent(disruptor.clj:43)
at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87)
at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:61)
at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
at 
backtype.storm.daemon.executor$fn__3498$fn__3510$fn__3557.invoke(executor.clj:730)
at backtype.storm.util$async_loop$fn__444.invoke(util.clj:403)
at clojure.lang.AFn.run(AFn.java:24)
at java.lang.Thread.run(Thread.java:724)

I searched around and realized that its version related error. But both my
compile-time and run-time Scala versions are same(2.9.2) still I am getting
the error.

Any help is appreciated !

Thanks
-- 
*Abhishek Bhattacharjee*
*Pune Institute of Computer Technology*


Re: Kafka-Storm Run-time Exception

2014-06-05 Thread Andrew Neilson
it's possible you have some other dependency using an earlier version of
Scala. A common one to check for when using Kafka is jline 0.9.94, which
comes through the zookeeper 3.3.4 dependency included with
kafka_2.9.2-0.8.1 and has more than one dependency that uses scala 2.8.x.

If this is where it's coming from, a solution is to just exclude jline from
the kafka/storm-kafka dependency in your pom.xml. ex:


  org.apache.kafka
  kafka_2.9.2
  0.8.1
  

  jline
  jline

  


You can also check your dependency tree and look for other references to
other scala versions (mvn dependency:tree).


On Thu, Jun 5, 2014 at 4:40 PM, Abhishek Bhattacharjee <
abhishek.bhattacharje...@gmail.com> wrote:

> I am using kafka with storm. I am using maven to build my topology and I
> am using scala 2.9.2 same as I am using kafka_2.9.2_0.8.1.
>
> Topology build perfectly using maven. But hwn I submit the topology to
> storm I get the following Exception:
>
> java.lang.NoSuchMethodError: scala.Predef$.int2Integer(I)Ljava/lang/Integer;
>   at kafka.api.OffsetRequest$.(OffsetRequest.scala:28)
>   at kafka.api.OffsetRequest$.(OffsetRequest.scala)
>   at kafka.consumer.ConsumerConfig$.(ConsumerConfig.scala:36)
>   at kafka.consumer.ConsumerConfig$.(ConsumerConfig.scala)
>   at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:35)
>   at kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:35)
>   at 
> storm.ubiquitous.spouts.KafkaConsumer.fetchdata(KafkaConsumer.java:41)
>   at 
> storm.ubiquitous.spouts.KafkaSpoutTransaction$KafkaPartitionedEmitter.emitPartitionBatchNew(KafkaSpoutTransaction.java:92)
>   at 
> storm.ubiquitous.spouts.KafkaSpoutTransaction$KafkaPartitionedEmitter.emitPartitionBatchNew(KafkaSpoutTransaction.java:68)
>   at 
> backtype.storm.transactional.partitioned.PartitionedTransactionalSpoutExecutor$Emitter$1.init(PartitionedTransactionalSpoutExecutor.java:77)
>   at 
> backtype.storm.transactional.state.RotatingTransactionalState.getState(RotatingTransactionalState.java:82)
>   at 
> backtype.storm.transactional.state.RotatingTransactionalState.getStateOrCreate(RotatingTransactionalState.java:103)
>   at 
> backtype.storm.transactional.partitioned.PartitionedTransactionalSpoutExecutor$Emitter.emitBatch(PartitionedTransactionalSpoutExecutor.java:73)
>   at 
> backtype.storm.transactional.partitioned.PartitionedTransactionalSpoutExecutor$Emitter.emitBatch(PartitionedTransactionalSpoutExecutor.java:50)
>   at 
> backtype.storm.transactional.TransactionalSpoutBatchExecutor.execute(TransactionalSpoutBatchExecutor.java:48)
>   at 
> backtype.storm.coordination.CoordinatedBolt.execute(CoordinatedBolt.java:308)
>   at 
> backtype.storm.daemon.executor$fn__3498$tuple_action_fn__3500.invoke(executor.clj:615)
>   at 
> backtype.storm.daemon.executor$mk_task_receiver$fn__3421.invoke(executor.clj:383)
>   at 
> backtype.storm.disruptor$clojure_handler$reify__2962.onEvent(disruptor.clj:43)
>   at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87)
>   at 
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:61)
>   at 
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62)
>   at 
> backtype.storm.daemon.executor$fn__3498$fn__3510$fn__3557.invoke(executor.clj:730)
>   at backtype.storm.util$async_loop$fn__444.invoke(util.clj:403)
>   at clojure.lang.AFn.run(AFn.java:24)
>   at java.lang.Thread.run(Thread.java:724)
>
> I searched around and realized that its version related error. But both my
> compile-time and run-time Scala versions are same(2.9.2) still I am getting
> the error.
>
> Any help is appreciated !
>
> Thanks
> --
> *Abhishek Bhattacharjee*
> *Pune Institute of Computer Technology*
>


supervisor start error

2014-06-05 Thread zongxuqin
I installed a storm on a single machine.when i start the supervisor , it 
generate such error


org.apache.thrift7.transport.TTransportException: Could not create 
ServerSocket on address 0.0.0.0/0.0.0.0:6627.
at 
org.apache.thrift7.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:89) 
~[libthrift7-0.7.0-2.jar:0.7.0-2]
at 
org.apache.thrift7.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:68) 
~[libthrift7-0.7.0-2.jar:0.7.0-2]
at 
org.apache.thrift7.transport.TNonblockingServerSocket.(TNonblockingServerSocket.java:61) 
~[libthrift7-0.7.0-2.jar:0.7.0-2]
at 
backtype.storm.daemon.nimbus$launch_server_BANG_.invoke(nimbus.clj:1137) 
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.daemon.nimbus$_launch.invoke(nimbus.clj:1167) 
~[storm-core-0.9.0.1.jar:na]
at backtype.storm.daemon.nimbus$_main.invoke(nimbus.clj:1189) 
~[storm-core-0.9.0.1.jar:na]

at clojure.lang.AFn.applyToHelper(AFn.java:159) ~[clojure-1.4.0.jar:na]
at clojure.lang.AFn.applyTo(AFn.java:151) ~[clojure-1.4.0.jar:na]
at backtype.storm.daemon.nimbus.main(Unknown Source) 
~[storm-core-0.9.0.1.jar:na]



I do not konw what cause this