from:"Spico Florin"

Re: Kafka monitor unable to get offset lag

2017-02-01 Thread Spico Florin

Hello!
  You can check how your topic is consumed  and its health via the scripts:
./kafka-consumer-groups.sh --new-consumer --bootstrap-server
:6667 --list
 ./kafka-consumer-groups.sh --new-consumer --bootstrap-server
:6667 --describe --group 

I hope it helps.
 Florin

On Wed, Feb 1, 2017 at 11:01 AM, Igor Kuzmenko  wrote:

> Yes, topology process data and works fine.
> I couldn't find any exceptions in storm logs.   access-web-ui.log contains
> only these lines
>
> 2017-02-01 11:58:17.315 o.a.s.l.f.AccessLoggingFilter [INFO] Access from:
> 10.35.63.14 url: http://master001.s:8744/api/v1/cluster/summary principal:
> 2017-02-01 11:58:17.315 o.a.s.l.f.AccessLoggingFilter [INFO] Access from:
> 10.35.63.14 url: http://master001.s:8744/api/v1/nimbus/summary principal:
> 2017-02-01 11:58:25.346 o.a.s.l.f.AccessLoggingFilter [INFO] Access from:
> 10.35.63.14 url: http://master001.s:8744/api/v1/topology/summary
> principal:
> 2017-02-01 11:58:25.346 o.a.s.l.f.AccessLoggingFilter [INFO] Access from:
> 10.35.63.14 url: http://master001.s:8744/api/v1/cluster/summary principal:
> 2017-02-01 11:58:25.346 o.a.s.l.f.AccessLoggingFilter [INFO] Access from:
> 10.35.63.14 url: http://master001.s:8744/api/v1/nimbus/summary principal:
>
> On Tue, Jan 31, 2017 at 5:19 PM, Priyank Shah 
> wrote:
>
>> Hi Igor,
>>
>> When you say topology is working fine do you mean you see data flowing?
>> Can you try to look up the logs for ui server and paste relevant lines here
>> if any?
>>
>> Priyank
>>
>> Sent from my iPhone
>>
>> On Jan 31, 2017, at 4:34 AM, Igor Kuzmenko  wrote:
>>
>> I've launched topology with new kafka spout. Topology by it self working
>> fine, but looking at storm UI I see kafka-monitor exception:
>> *Unable to get offset lags for kafka. Reason:
>> org.apache.kafka.shaded.common.errors.TimeoutException: Timeout expired
>> while fetching topic metadata*
>>
>> Maybe I forgot to configure something, but then how topology reads
>> messages?
>>
>>
>

Re: Required field 'nimbus_uptime_secs' is unset!

2016-07-13 Thread Spico Florin

Hi!
  For me it seems that you have to pass the classpath or to build a fat
jar. Please have a look at this post:
http://stackoverflow.com/questions/32976198/deploy-storm-topology-remotely-using-storm-jar-command-on-windows
Florin

On Wed, Jul 13, 2016 at 8:09 AM, ram kumar  wrote:

> I also tried setting option for nimbus_uptime_secs
>
> COMMAND:
> sparse submit -o "nimbus_uptime_secs=5"
>
> Still facing the same issue.
>
> On Tue, Jul 12, 2016 at 3:57 PM, ram kumar 
> wrote:
>
>> Hi all,
>>
>> I am trying to run storm topology in production mode,
>>
>> project.clj
>> (defproject meta "0.0.1-SNAPSHOT"
>>   :source-paths ["topologies"]
>>   :resource-paths ["_resources"]
>>   :target-path "_build"
>>   :min-lein-version "2.0.0"
>>   :jvm-opts ["-client"]
>>   :dependencies  [[org.apache.storm/storm-core "0.10.0"]
>>   [com.parsely/streamparse "0.0.4-SNAPSHOT"]
>>   ]
>>   :jar-exclusions [#"log4j\.properties" #"backtype" #"trident"
>> #"META-INF" #"meta-inf" #"\.yaml"]
>>   :uberjar-exclusions [#"log4j\.properties" #"backtype" #"trident"
>> #"META-INF" #"meta-inf" #"\.yaml"]
>>   )
>>
>>
>> config.json
>> {
>> "library": "",
>> "topology_specs": "topologies/",
>> "virtualenv_specs": "virtualenvs/",
>> "envs": {
>> "prod": {
>> "user": "ram",
>> "nimbus": "10.218.173.100",
>> "workers": ["10.154.9.166"],
>> "log": {
>> "path": "/home/ram/log/storm/streamparse",
>> "max_bytes": 100,
>> "backup_count": 10,
>> "level": "info"
>> },
>> "use_ssh_for_nimbus": true,
>> "use_virtualenv": false,
>> "virtualenv_root": "/home/ram/storm-topologies/meta"
>> }
>> }
>> }
>>
>>
>> When submitting topology as, *sparse submit *
>> getting,
>>
>> java.lang.RuntimeException:
>> org.apache.thrift7.protocol.TProtocolException: *Required field
>> 'nimbus_uptime_secs' is unset!*
>> Struct:ClusterSummary(supervisors:[SupervisorSummary(host:10.154.9.166,
>> uptime_secs:1654461, num_workers:2, num_used_workers:0,
>> supervisor_id:0a07f441-05f8-4103-b439-22cf42a1fcff,
>> version:0.10.0.2.4.2.0-258)], nimbus_uptime_secs:0, topologies:[])
>>
>>
>>
>> Can anyone help me with this,
>> Thanks,
>> Ram
>>
>>
>

Re: Allocating separate memory and workers to topologies of a single jar?

2016-07-13 Thread Spico Florin

Hello!
  For the the topology that you have 0MB allocated, for me it seems that
you don't have enough slots available. Check out the storm.yaml file (on
your worker machines) how many slots you have allocated.
(by default the are 4 slots available supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703) You have 5 topologies, therefore one is not ran.

Regarding the memory allocation, you allocate memory per each worker (slot
available), not per topology. If  you set up for your topology a number of
workers equal to 1, then you topology will run on a single worker
(available slot) and will receive the configuration that you gave for your
worker. If you configure to spread your spout and bolts to multiple
workers,(that are available in as configured slots)  then all the workers
will receive the same amount of memory configured globally via
 worker.childopts property in the storm yaml . So you don't configure the
meory per topology but per worker.

If you want use different memory allocation for workers depending on your
topology components memory consumption, then you should use the property

stormConfig.put("topology.worker.childopts",1024)

I hope it helps.

Regards,
 Florin

On Wed, Jul 13, 2016 at 9:23 AM, Navin Ipe 
wrote:

> I tried setting stormConfig.put(Config.WORKER_HEAP_MEMORY_MB, 1024); and
> now *all topologies* Assigned memory is 0MB.
> Does anyone know why it works this way? How do we assign memory to
> topologies in this situation?
>
> On Wed, Jul 13, 2016 at 10:39 AM, Navin Ipe <
> navin@searchlighthealth.com> wrote:
>
>> Hi,
>>
>> I have a program *MyProg.java* inside which I'm creating 5 topologies
>> and using *stormSubmitter* to submit it to Storm. The storm UI shows the
>> assigned memory for each topology as:
>> *Assigned Mem (MB)*
>> 832
>> 0
>> 832
>> 832
>> 832
>>
>> One of the topologies was assigned 0MB. Assuming that it was because of
>> setting a single Config for all of them, I gave them separate configs (as
>> below), but nothing changed.
>>
>> *StormConfigFactory stormConfigFactory = new StormConfigFactory();*
>>
>>
>>
>>
>>
>>
>> *StormSubmitter.submitTopology(upTopologyName,
>> stormConfigFactory.GetNewConfig(),
>> upBuilder.createTopology());StormSubmitter.submitTopology(viTopologyName,
>> stormConfigFactory.GetNewConfig(),
>> viBuilder.createTopology());StormSubmitter.submitTopology(prTopologyName,
>> stormConfigFactory.GetNewConfig(),
>> prBuilder.createTopology());StormSubmitter.submitTopology(opTopologyName,
>> stormConfigFactory.GetNewConfig(),
>> opBuilder.createTopology());StormSubmitter.submitTopology(poTopologyName,
>> stormConfigFactory.GetNewConfig(), poBuilder.createTopology());*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *And the StormConfigFactory class:public class StormConfigFactory {
>> public Config GetNewConfig() {Config stormConfig = new
>> Config();stormConfig.setNumWorkers(1);
>> stormConfig.setNumAckers(1);
>> stormConfig.put(Config.TOPOLOGY_DEBUG, false);
>> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 64);
>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
>> 65536);
>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,
>> 65536);stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,
>> 50);stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS,
>> 60);stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS,
>> Arrays.asList(new String[]{"localhost"}));
>> return stormConfig;}}*
>>
>>
>> *How do I allocate separate memory and workers for each topology?*
>>
>> --
>> Regards,
>> Navin
>>
>
>
>
> --
> Regards,
> Navin
>

Re: Worker process start time

2016-06-27 Thread Spico Florin

Hi!
  What kind of connections (spout, bolt) to external system do you have?
Are you connecting to other external systems (databases, distributed
message systems).
  If this is the case, please have a look what time do you need to connect
to them.
  Regards,
 Florin

On Fri, Jun 24, 2016 at 4:50 PM, Rudraneel chakraborty <
rudraneel.chakrabo...@gmail.com> wrote:

> My storm configuration is v1 3 nodes each with 2 core and 2gb rab
>
>
> On Friday, 24 June 2016, Jungtaek Lim  wrote:
>
>> Hi Rudraneel,
>>
>> What's your Storm version and how cluster is configured?
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2016년 6월 24일 (금) 오전 9:37, Rudraneel chakraborty <
>> rudraneel.chakrabo...@gmail.com>님이 작성:
>>
>>> Hi
>>>
>>> I am observing that my worker processes take a little over 15 seconds to
>>> start up (a single worker process with 6 executors with no ackers). Do you
>>> guys find it normal?
>>>
>>> Can i speed things up??
>>>
>>>
>>> --
>>> Rudraneel Chakraborty
>>> Carleton University Real Time and Distributed Systems Reserach
>>>
>>>
>
> --
> Rudraneel Chakraborty
> Carleton University Real Time and Distributed Systems Reserach
>
>

Re: java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class

2016-06-27 Thread Spico Florin

Hi!
  Storm uses it own thrift library (org.apache.thrift:libthrift:pom:0.9.2).
When you submit on the cluster you'll have two different versions of Thrift
(0.9.3 yours and 0.9.2 from storm)  loaded in the classloader.
Solution:
1. Exclude the the thrift version 0.9.3 from the maven dependency (see what
library refers to it and use  maven)  .
2. set the the storm-core  library as "compile" scope dependency for maven
(see  element fro dependency).
I hope that it helps.
Regards,
 Florin


On Mon, Jun 27, 2016 at 9:22 AM, Venkatesh Bodapati <
venkatesh.bodap...@inndata.in> wrote:

> Hi  Florin, Thanks for your reply.
>
> These are the jars i used in mylib :
> antlr4-4.5.1-1.jar
> antlr4-runtime-4.5.1-1.jar
> asm-4.0.jar
> carbonite-1.4.0.jar
> chill-java-0.3.5.jar
> clj-stacktrace-0.2.2.jar
> clj-time-0.4.1.jar
> clojure-1.5.1.jar
> clout-1.0.1.jar
> commons-codec-1.6.jar
> commons-exec-1.1.jar
> commons-fileupload-1.2.1.jar
> commons-io-2.4.jar
> commons-lang-2.5.jar
> commons-logging-1.1.3.jar
> commons-logging-api-1.1.jar
> compojure-1.1.3.jar
> core.incubator-0.1.0.jar
> curator-client-2.6.0.jar
> curator-framework-2.6.0.jar
> derby-10.12.1.1.jar
> disruptor-2.10.1.jar
> guava-11.0.2.jar
> hadoop-client-2.6.0.jar
> hadoop-mapreduce-client-core-2.6.0.jar
> hiccup-0.3.6.jar
> hive-exec-0.8.0.jar
> jetty-util-6.1.26.jar
> jgrapht-core-0.9.0.jar
> jline-2.11.jar
> joda-time-2.0.jar
> json-simple-1.1.1.jar
> junit-4.3.jar
> kafka_2.11-0.8.2.1.jar
> kafka_2.11-0.8.2.1-javadoc.jar
> kafka_2.11-0.8.2.1-scaladoc.jar
> kafka_2.11-0.8.2.1-sources.jar
> kafka_2.11-0.8.2.1-test.jar
> kafka-clients-0.8.2.1.jar
> kryo-2.21.jar
> logback-classic-1.0.13.jar
> logback-core-1.0.13.jar
> math.numeric-tower-0.0.1.jar
> metrics-core-2.2.0.jar
> minlog-1.2.jar
> objenesis-1.2.jar
> reflectasm-1.07-shaded.jar
> ring-core-1.1.5.jar
> ring-devel-0.3.11.jar
> ring-jetty-adapter-0.3.11.jar
> ring-servlet-0.3.11.jar
> scala-library-2.11.5.jar
> servlet-api-2.5.jar
> slf4j-api-1.7.5.jar
> snakeyaml-1.11.jar
> snappy-java-1.1.1.6.jar
> storm-core-0.9.5.jar
> tools.cli-0.2.4.jar
> tools.logging-0.2.3.jar
> tools.macro-0.1.0.jar
> zkclient-0.3.jar
> zookeeper-3.4.6.jar
> gson-2.2.4.jar
> java-json.jar
> activation-1.1.jar
> storm-core-0.9.1-incubating.jar
> log4j-1.2.17.jar
> mockito-all-1.8.5.jar
> org.apache.commons.collections.jar
> org.apache.commons.httpclient.jar
> org.apache.commons.logging-1.1.1.jar
> org.json-20120521.jar
> protobuf-java-2.5.0.jar
> kafka_2.11-0.8.2.1/libs/jopt-simple-3.2.jar
> kafka_2.11-0.8.2.1/libs/scala-parser-combinators_2.11-1.0.2.jar
> kafka_2.11-0.8.2.1/libs/scala-xml_2.11-1.0.2.jar
> httpcore-4.4.jar
> hadoop-auth-2.7.2.jar
> hadoop-streaming-2.7.2.jar
> htrace-core-3.1.0-incubating.jar
> hadoop-nfs-2.7.2.jar
> hive-cli-1.2.1.jar
> hive-common-1.2.1.jar
> hive-contrib-1.2.1.jar
> hive-exec-1.2.1.jar
> hive-jdbc-1.2.1.jar
> hive-metastore-1.2.1.jar
> hive-serde-1.2.1.jar
> hive-service-1.2.1.jar
> hive-testutils-1.2.1.jar
> mysql-connector-java-5.1.35-bin.jar
> hadoop-common-2.7.1.jar
> httpclient-4.5.jar
> commons-beanutils-core-1.8.0.jar
> commons-collections-3.2.1.jar
> commons-configuration-1.6.jar
> commons-httpclient-3.0.1.jar
> libthrift-0.9.3.jar
>
> i am using  only one libthrift jar that is libthrift-0.9.3 jar now but i
> got the same error, like this " java.lang.NoSuchMethodError:
> org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class".
>
> Thanks & Regards,
> venkatesh
>
> On Thu, Jun 23, 2016 at 2:55 PM, Spico Florin <spicoflo...@gmail.com>
> wrote:
>
>> Hi!
>>   Please check out your classpath (maven or gradle dependencies). It
>> seems that you are using two versions of Thrift library protocol.
>> Regards,
>>  Florin
>>
>> On Wed, Jun 22, 2016 at 10:40 AM, Venkatesh Bodapati <
>> venkatesh.bodap...@inndata.in> wrote:
>>
>>> I am working on storm with hive,sql,kafka. i will read data from
>>> kafkatopic and hive table, and send data to mysql and another kafkatopic.i
>>> willn't able to connect with hive from storm. i got error like this
>>>
>>> java.lang.NoSuchMethodError:
>>> org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
>>> at
>>> org.apache.hive.service.cli.thrift.TCLIService$OpenSession_args.write(TCLIService.java:1854)
>>> ~[hive-service-1.2.1.jar:1.2.1]
>>> at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:63)
>>> ~[hive-exec-0.8.0.jar:0.8.0]
>>> at
>>> org.apache.hive.service.cli.thrif

Re: java.lang.NoSuchMethodError: org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class

2016-06-23 Thread Spico Florin

Hi!
  Please check out your classpath (maven or gradle dependencies). It seems
that you are using two versions of Thrift library protocol.
Regards,
 Florin

On Wed, Jun 22, 2016 at 10:40 AM, Venkatesh Bodapati <
venkatesh.bodap...@inndata.in> wrote:

> I am working on storm with hive,sql,kafka. i will read data from
> kafkatopic and hive table, and send data to mysql and another kafkatopic.i
> willn't able to connect with hive from storm. i got error like this
>
> java.lang.NoSuchMethodError:
> org.apache.thrift.protocol.TProtocol.getScheme()Ljava/lang/Class;
> at
> org.apache.hive.service.cli.thrift.TCLIService$OpenSession_args.write(TCLIService.java:1854)
> ~[hive-service-1.2.1.jar:1.2.1]
> at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:63)
> ~[hive-exec-0.8.0.jar:0.8.0]
> at
> org.apache.hive.service.cli.thrift.TCLIService$Client.send_OpenSession(TCLIService.java:150)
> ~[hive-service-1.2.1.jar:1.2.1]
> at
> org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:142)
> ~[hive-service-1.2.1.jar:1.2.1]
> at
> org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:583)
> ~[hive-jdbc-1.2.1.jar:1.2.1]
> at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:192)
> ~[hive-jdbc-1.2.1.jar:1.2.1]
> at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
> ~[hive-jdbc-1.2.1.jar:1.2.1]
> at java.sql.DriverManager.getConnection(DriverManager.java:664)
> ~[na:1.8.0_91]
> at java.sql.DriverManager.getConnection(DriverManager.java:247)
> ~[na:1.8.0_91]
> at com.inndata.wirecard.kafkaspout.nextTuple(kafkaspout.java:88)
> ~[MysqlKafkaStream.jar:na]
> at
> backtype.storm.daemon.executor$fn__6579$fn__6594$fn__6623.invoke(executor.clj:565)
> ~[storm-core-0.9.5.jar:0.9.5]
> at backtype.storm.util$async_loop$fn__459.invoke(util.clj:463)
> ~[storm-core-0.9.5.jar:0.9.5]
> at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> 25802 [Thread-17-kafkaspout] ERROR backtype.storm.daemon.executor -
>
> i am using hive-1.2.1 version, storm :0.9.5 . using libthrift jar :
> libthrift-0.9.2.jar.I tried searching for this issue, but i can't find
> relevent jar.Please help me to fix this.
>

Re: Re: Problem to write into HBase

2016-06-12 Thread Spico Florin

HI!
  For me it seems that the your HBase bolt is loosing the connection with
zookeeper and it tries over and over to connect via thread.
 Please check your zookeeper health. Do your HBase cluster and Storm
cluster are using the same Zookeeper? How many Hbase region  servers  and
how many storm workers are you using?
Depending on these numbers and your workload you can overkill your zk
cluster.
I hope that these help.
  Florin


On Sun, Jun 12, 2016 at 11:11 AM, Satish Duggana 
wrote:

> Hi,
> Your message says it is throwing OutOfMemory error. So, you should look
> into what is causing that. It may not be really because of storm but it may
> be because of application code also. You may want to use `
> -XX:+HeapDumpOnOutOfMemoryError` and `-XX:HeapDumpPath=/worker/dumps` to
> dump heap on outofmemory and analyze what is causing OutOfMemory.
>
> Thanks,
> Satish.
>
> On Sun, Jun 12, 2016 at 11:34 AM, fanxi...@travelsky.com <
> fanxi...@travelsky.com> wrote:
>
>> Hi Wenwei
>>
>> Actually, I did not new any thread in my bolt. The error reported just
>> comes from storm core itself.
>>
>> --
>>
>> I t seems you created too many threads, that
>> cause no available thread resource.
>>
>> Sent from my iPhone
>>
>> On Jun 12, 2016, at 10:18, "fanxi...@travelsky.com" <
>> fanxi...@travelsky.com> wrote:
>>
>> *Hi  user,*
>>
>> *I have a topology to write into HBase. Every time I submitted the
>> topology, it runned well. But after a well, for example, one or two days,
>> the topology always reports an execption like below:*
>>
>> java.lang.OutOfMemoryError: unable to create new native thread at
>> java.lang.Thread.start0(Native Method) at
>> java.lang.Thread.start(Thread.java:714) at
>> org.apache.zookeeper.ClientCnxn.start(ClientCnxn.java:406) at
>> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:450) at
>> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) at
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:140)
>> at
>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.(RecoverableZooKeeper.java:127)
>> at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:132) at
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:165)
>> at
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:134)
>> at
>> org.apache.hadoop.hbase.catalog.CatalogTracker.(CatalogTracker.java:179)
>> at
>> org.apache.hadoop.hbase.catalog.CatalogTracker.(CatalogTracker.java:153)
>> at
>> org.apache.hadoop.hbase.catalog.CatalogTracker.(CatalogTracker.java:135)
>> at
>> org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:234)
>> at
>> org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:306)
>> at
>> com.travelsky.roc.hbase.utils.HBaseUtils.isTableAvailable(HBaseUtils.java:22)
>> at
>> com.travelsky.roc.hbase.bolt.HBaseSinkBolt.execute(HBaseSinkBolt.java:279)
>> at
>> backtype.storm.daemon.executor$fn__5641$tuple_action_fn__5643.invoke(executor.clj:631)
>> at
>> backtype.storm.daemon.executor$mk_task_receiver$fn__5564.invoke(executor.clj:399)
>> at
>> backtype.storm.disruptor$clojure_handler$reify__745.onEvent(disruptor.clj:58)
>> at
>> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
>> at
>> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
>> at
>> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
>> at
>> backtype.storm.daemon.executor$fn__5641$fn__5653$fn__5700.invoke(executor.clj:746)
>> at backtype.storm.util$async_loop$fn__457.invoke(util.clj:431) at
>> clojure.lang.AFn.run(AFn.java:24) at 
>> java.lang.Thread.run(Thread.java:745)*then
>> the topology runs very slow.*
>> *I took a look at the log, it is full of the information like below:*
>> 2016-06-12 10:16:04 o.a.h.h.z.RecoverableZooKeeper [INFO] Process
>> identifier=catalogtracker-on-hconnection-0x5ade861c connecting to ZooKeeper
>> ensemble=r720m6-hdp:2181,r720m8-hdp:2181,r720n5-hdp:2181
>> 2016-06-12 10:16:04 o.a.z.ZooKeeper [INFO] Initiating client connection,
>> connectString=r720m6-hdp:2181,r720m8-hdp:2181,r720n5-hdp:2181
>> sessionTimeout=12 watcher=catalogtracker-on-hconnection-0x5ade861c,
>> quorum=r720m6-hdp:2181,r720m8-hdp:2181,r720n5-hdp:2181, baseZNode=/hbasenew
>> 2016-06-12 10:16:04 o.a.z.ClientCnxn [INFO] Opening socket connection to
>> server r720m8-hdp/10.6.116.3:2181. Will not attempt to authenticate
>> using SASL (unknown error)
>> 2016-06-12 10:16:04 o.a.z.ClientCnxn [INFO] Socket connection established
>> to r720m8-hdp/10.6.116.3:2181, initiating session
>> 2016-06-12 10:16:04 o.a.z.ClientCnxn [INFO] Session establishment
>> complete on server r720m8-hdp/10.6.116.3:2181, sessionid =
>> 0x15138b0b2df471f, negotiated timeout = 12
>> 2016-06-12 10:16:04 o.a.z.ZooKeeper [INFO] Session: 0x15138b0b2df471f
>> closed
>> 2016-06-12 10:16:04 o.a.z.ClientCnxn [INFO]

Re: Usage of G1 Garbage collector

2016-06-03 Thread Spico Florin

than you very much for sharing these. i hope that others will contribute.
regards,
florin

On Friday, June 3, 2016, Stephen Powis <spo...@salesforce.com> wrote:

> Here's the flags we're using:
>
> -Xms3G -Xmx3G -XX:MaxPermSize=100M  -Xloggc:gc-%ID%.log
> -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -XX:+DisableExplicitGC 
> -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=50M -XX:+UseCompressedOops 
> -XX:+AlwaysPreTouch -XX:+UseG1GC
> -XX:MaxGCPauseMillis=20 -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=OOMDump-%ID%.log
>
> Here's a link to a GC log analysis (takes several minutes to load)
>  
> http://gceasy.io/my-gc-report?p=L2hvbWUvcmFtL3VwbG9hZC9pbnN0YW5jZTEvc2hhcmVkLzIwMTYtNi0zL2djbG9nLnppcC0wLTU3LTE3
>
> This is definitely outside the realm of my knowledge, so don't use this as 
> any kind of benchmark.
> I'm curious if anyone can comment on any suggested adjustments?  I feel like 
> we have a ton of churn
> in our young gen, but really don't have the experience tuning java's GC or 
> any examples to compare it with.
>
> Thanks!
>
>
> On Fri, Jun 3, 2016 at 3:41 PM, Otis Gospodnetić <
> otis.gospodne...@gmail.com
> <javascript:_e(%7B%7D,'cvml','otis.gospodne...@gmail.com');>> wrote:
>
>> Hi,
>>
>> +1 for G1 for large heaps where you are seeing big GC pauses.  Works well
>> for us.
>> See:
>> https://sematext.com/blog/2013/06/24/g1-cms-java-garbage-collector/
>>
>> Otis
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>> On Tue, May 31, 2016 at 5:21 AM, Spico Florin <spicoflo...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','spicoflo...@gmail.com');>> wrote:
>>
>>> Hello!
>>>  I would like the community  the following:
>>> 1. Are you using the G1 garbage collector for your workers/supervisors
>>>  in production?
>>> 2. Have you observed any improvement added by adding this GC style?
>>> 3. What are the JVM options that you are using and are a good fit for
>>> you?
>>>
>>> Thank you in advance.
>>>  Regards,
>>>  Florin
>>>
>>>
>>
>

Fwd: Usage of G1 Garbage collector

2016-05-31 Thread Spico Florin

Hello!
 I would like the community  the following:
1. Are you using the G1 garbage collector for your workers/supervisors  in
production?
2. Have you observed any improvement added by adding this GC style?
3. What are the JVM options that you are using and are a good fit for you?

Thank you in advance.
 Regards,
 Florin

Re: Is Storm visualization enough for performance ?

2016-05-11 Thread Spico Florin

Hi!
  Storm UI should give you a bird overview of the topology behavior on your
cluster. For different tools and techniques for finding performance issues
and fine tuning I recommend to read the book "Storm applied" the chapters
that covers these subject.
https://www.safaribooksonline.com/

They will help you to better understand the storm at its core and
methodologies to use it at best. These framework is not easy, it requires
time to digest,.

I hope it helps.
 Florin

On Wed, May 11, 2016 at 1:42 AM, sam mohel  wrote:

> Iam researcher and my goal was in algorithm in project using storm to make
> data more accuracy .I finished the project and submitted topology but need
> to make comparison between old I have and new i made . Need to know if I
> made a valuable change or not ?
>
> Is storm visualization in storm 0.9.6 enough or o need to use like ganglia
> or sematext ?
>

Re: How to let a topology know that it's time to stop?

2016-05-09 Thread Spico Florin

Hi!
 You welcome Navine. I'm also interested in the solution. Can you please
share your remarks and (some code :)) after the implementation?
Thanks.
Regards,\
 Florin

On Mon, May 9, 2016 at 7:20 AM, Navin Ipe <navin@searchlighthealth.com>
wrote:

> @Matthias: That's genius! I didn't know streams and allGroupings could be
> used like that.
> In the way Storm introduced tick tuples, it'd have been nice if Storm had
> a native technique of doing all this, but the ideas you've come up with are
> extremely good. Am going to try implementing them right away.
> Thank you too Florin!
>
> On Mon, May 9, 2016 at 12:48 AM, Matthias J. Sax <mj...@apache.org> wrote:
>
>> To synchronize this, use an additional "shut down bolt" that used
>> parallelism of one. "shut down bolt" must be notified by all parallel
>> DbBolts after they performed the flush. If all notifications are
>> received, there are not in-flight message and thus "shut down bolt" can
>> kill the topology safely.
>>
>> -Matthias
>>
>>
>>
>> On 05/08/2016 07:27 PM, Spico Florin wrote:
>> > hi!
>> >   there is this solution of sending a poison pill message from the
>> > spout. on bolt wil receiv your poison pill and will kill topology via
>> > storm storm nimbus API. one potentential issue whith this approach is
>> > that due to your topology structure regarding the parralelism of your
>> > bolts nd the time required by themto excute their bussineess logic, is
>> > that the poison pill to be swallowed by the one bolt responsilble for
>> > killing the topology, before all the other messages that are in-flight
>> > to be processed. the conseuence is that you cannot be sure that all the
>> > messagess sent by the spout were processed. also sharing the total
>> > number of sent messages between the excutors in order to shutdown when
>> > all messages were processed coul be error prone since  tuple can be
>> > processed many times (depending on your guaranteee message processing)
>> > or they could be failed.
>> >   i coul not find  a solution for this. storm is intended to run
>> > forunbounded data.
>> > i hope that thrse help,
>> > regard,
>> > florin
>> >
>> >
>> > On Sunday, May 8, 2016, Matthias J. Sax <mj...@apache.org
>> > <mailto:mj...@apache.org>> wrote:
>> >
>> > You can get the number of bolt instances from TopologyContext that
>> is
>> > provided in Bolt.prepare()
>> >
>> > Furthermore, you could put a loop into your topology, ie, a bolt
>> reads
>> > it's own output; if you broadcast (ie, allGrouping) this
>> > feedback-loop-stream you can let bolt instances talk to each other.
>> >
>> > builder.setBolt("DbBolt", new MyDBBolt())
>> >.shuffleGrouping("spout")
>> >.allGrouping("flush-stream", "DbBolt");
>> >
>> > where "flush-stream" is a second output stream of MyDBBolt()
>> sending a
>> > notification tuple after it received the end-of-stream from spout;
>> > furthermore, if a bolt received the signal via "flush-stream" from
>> > **all** parallel bolt instances, it can flush to DB.
>> >
>> > Or something like this... Be creative! :)
>> >
>> >
>> > -Matthias
>> >
>> >
>> > On 05/08/2016 02:26 PM, Navin Ipe wrote:
>> > > @Matthias: I agree about the batch processor, but my superior
>> took the
>> > > decision to use Storm, and he visualizes more complexity later for
>> > which
>> > > he needs Storm.
>> > > I had considered the "end of stream" tuple earlier (my idea was to
>> > emit
>> > > 10 consecutive nulls), but then the question was how do I know how
>> > many
>> > > bolt instances have been created, and how do I notify all the
>> bolts?
>> > > Because it's only after the last bolt finishes writing to DB,
>> that I
>> > > have to shut down the topology.
>> > >
>> > > @Jason: Thanks. I had seen storm signals earlier (I think from
>> one of
>> > > your replies to someone else) and I had a look at the code too,
>> > but am a
>> > > bit wary because it's no longer being maintained and because of
>> the
>> > > issues: https://github.com/ptg

Re: Pull data from Redis every minute

2016-05-08 Thread Spico Florin

hi!
 for me it looks like proceesing the data in a specific window. you coul
achive this by using the new feature in storm 1.0 namely window bolt. via
spout you get the data thta you need and in the window bolt do the sum. be
careful with the time thta you are using processing time versus event time.
if your data has a timestsmp and you would like to use this field in your
logic time then you need the event time and the notion of watermark. you
coul find more info about this on
https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
i hope it helps.
florin

On Saturday, May 7, 2016, Erik Weathers  wrote:

> If you need something to be done at a regular interval within Storm you
> might wanna leverage the "tick tuple" concept:
>
>- http://kitmenke.com/blog/2014/08/04/tick-tuples-within-storm/
>
>
> On Fri, May 6, 2016 at 11:53 PM, Daniela S  > wrote:
>
>> Hi,
>>
>> I would like to use Redis as a kind of cache for my active devices. So i
>> receive a message over Kafka with an active device and store it in Redis.
>> Now I would like to pull all active devices from Redis and to add a
>> specific value to each device and to build a sum afterwards.
>> How can I ensure that the spout fetches the list of active devices e.g.
>> every minute? Because the sum has to be built every minute from scratch, as
>> the additional specific value changes every minute for every device.
>>
>> Thank you in advance.
>>
>> Regards,
>> Daniela
>>
>
>

Re: How to let a topology know that it's time to stop?

2016-05-08 Thread Spico Florin

hi!
  there is this solution of sending a poison pill message from the spout.
on bolt wil receiv your poison pill and will kill topology via storm storm
nimbus API. one potentential issue whith this approach is that due to your
topology structure regarding the parralelism of your bolts nd the time
required by themto excute their bussineess logic, is that the poison pill
to be swallowed by the one bolt responsilble for killing the topology,
before all the other messages that are in-flight to be processed. the
conseuence is that you cannot be sure that all the messagess sent by the
spout were processed. also sharing the total number of sent messages
between the excutors in order to shutdown when all messages were processed
coul be error prone since  tuple can be processed many times (depending on
your guaranteee message processing) or they could be failed.
  i coul not find  a solution for this. storm is intended to run
forunbounded data.
i hope that thrse help,
regard,
florin


On Sunday, May 8, 2016, Matthias J. Sax  wrote:

> You can get the number of bolt instances from TopologyContext that is
> provided in Bolt.prepare()
>
> Furthermore, you could put a loop into your topology, ie, a bolt reads
> it's own output; if you broadcast (ie, allGrouping) this
> feedback-loop-stream you can let bolt instances talk to each other.
>
> builder.setBolt("DbBolt", new MyDBBolt())
>.shuffleGrouping("spout")
>.allGrouping("flush-stream", "DbBolt");
>
> where "flush-stream" is a second output stream of MyDBBolt() sending a
> notification tuple after it received the end-of-stream from spout;
> furthermore, if a bolt received the signal via "flush-stream" from
> **all** parallel bolt instances, it can flush to DB.
>
> Or something like this... Be creative! :)
>
>
> -Matthias
>
>
> On 05/08/2016 02:26 PM, Navin Ipe wrote:
> > @Matthias: I agree about the batch processor, but my superior took the
> > decision to use Storm, and he visualizes more complexity later for which
> > he needs Storm.
> > I had considered the "end of stream" tuple earlier (my idea was to emit
> > 10 consecutive nulls), but then the question was how do I know how many
> > bolt instances have been created, and how do I notify all the bolts?
> > Because it's only after the last bolt finishes writing to DB, that I
> > have to shut down the topology.
> >
> > @Jason: Thanks. I had seen storm signals earlier (I think from one of
> > your replies to someone else) and I had a look at the code too, but am a
> > bit wary because it's no longer being maintained and because of the
> > issues: https://github.com/ptgoetz/storm-signals/issues
> >
> > On Sun, May 8, 2016 at 5:40 AM, Jason Kusar  
> > >> wrote:
> >
> > You might want to check out Storm Signals.
> > https://github.com/ptgoetz/storm-signals
> >
> > It might give you what you're looking for.
> >
> >
> > On Sat, May 7, 2016, 11:59 AM Matthias J. Sax  
> > >> wrote:
> >
> > As you mentioned already: Storm is designed to run topologies
> > forever ;)
> > If you have finite data, why do you not use a batch processor???
> >
> > As a workaround, you can embed "control messages" in your stream
> > (or use
> > an additional stream for them).
> >
> > If you want a topology to shut down itself, you could use
> >
>  `NimbusClient.getConfiguredClient(conf).getClient().killTopology(name);`
> > in your spout/bolt code.
> >
> > Something like:
> >  - Spout emit all tuples
> >  - Spout emit special "end of stream" control tuple
> >  - Bolt1 processes everything
> >  - Bolt1 forward "end of stream" control tuple (when it received
> it)
> >  - Bolt2 processed everything
> >  - Bolt2 receives "end of stream" control tuple => flush to DB
> > => kill
> > topology
> >
> > But I guess, this is kinda weird pattern.
> >
> > -Matthias
> >
> > On 05/05/2016 06:13 AM, Navin Ipe wrote:
> > > Hi,
> > >
> > > I know Storm is designed to run forever. I also know about
> > Trident's
> > > technique of aggregation. But shouldn't Storm have a way to
> > let bolts
> > > know that a certain bunch of processing has been completed?
> > >
> > > Consider this topology:
> > > Spout-->Bolt-A-->Bolt-B
> > > |  |--->Bolt-B
> > > |  \--->Bolt-B
> > > |--->Bolt-A-->Bolt-B
> > > |  |--->Bolt-B
> > > |  \--->Bolt-B
> > > \--->Bolt-A-->Bolt-B
> > >|--->Bolt-B
> > >

Re: Why tuples fail in spout

2016-05-06 Thread Spico Florin

hi!
  can you please write the hint parallesim that you are using for each sout
and bolts.
there is  formula on what amount on data you can have unacked nd the amound
of data that you can process in your bolts.
  there is a good book Storm applied where this formul is explained nd
asloong with other goodies.
you can read the book from creating a an account.
https://www.safaribooksonline.com
i cite from the book
so if you have 2 spouts for example with 2 task for each of them and the
max spout pending is 10 the number of possible unacked tuples is:
 2 spouts * 2 tasks * 10 max spout pending = 40 unacked tuples possible

if the number of the possible uncked tuples is lower than total parallelism
you have set for your topology then it coud be a bottleneck.
 the total number of parralellism is the sum of all the maxtasks that you
have set up for your bolts or in case that you have used the
hintparallelims is the sum of all hintparralelsim.

suppose that you have bolt b1 with 10 hint p, and bolt b2 with 12 hp then
the total p is 10hpb1+12hpb2=22thp. so considering the above max speding
setup we are fine becuse we have 40 greater than 22 so max pending is not
an issue.
i really recommend all the book Storm applied. is rhe best book that
explains also the storm internals

i hope it helps.
regards,
florin

On Friday, May 6, 2016, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> Hi All,
>
>
> after increment of parallelism in bolts also spout is failing ? how can I
> know how much time my bolts are taking to complete the tuple processing.I'm
> using kafka spout with 10 partitions so I kept my
> topology.max.spout.pending as 10 .do I need to increase the value more than
> that?if yes to extent i can increase the  number ? as a stupid play i kept
> the value 100 to 1 partition kafka it give me metrics error.
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>
> On Thu, May 5, 2016 at 12:23 PM, Sai Dilip Reddy Kiralam <
> dkira...@aadhya-analytics.com
> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>
>>
>>
>>
>> *Best regards,*
>>
>> *K.Sai Dilip Reddy.*
>>
>> -- Forwarded message --
>> From: Spico Florin <spicoflo...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','spicoflo...@gmail.com');>>
>> Date: Tue, May 3, 2016 at 1:02 PM
>> Subject: Re: Why tuples fail in spout
>> To: user@storm.apache.org
>> <javascript:_e(%7B%7D,'cvml','user@storm.apache.org');>
>>
>>
>> Hi!
>>   It could be that your process time for that tuples is greater than  
>> topology.message.timeout.secs
>> 30 seconds. In these case have a look at your bolts processing time or how
>> fast you generate your data from spout that could not cope with your bolts.
>> You can:
>> 1. Increase your level your parallelism for your bolts.
>> 2. Throttling the messages from the spouts by setting out
>> topology.max.spout.pending
>> Please have a look at these questions:
>>
>> http://stackoverflow.com/questions/32322682/apache-storm-what-happens-to-a-tuple-when-no-bolts-are-available-to-consume-it
>>
>>
>> http://stackoverflow.com/questions/26536525/what-is-the-point-of-timing-out-tuples
>>
>> I hope that these help.
>>  Regards,
>>  Florin
>>
>> On Tue, May 3, 2016 at 9:42 AM, Sai Dilip Reddy Kiralam <
>> dkira...@aadhya-analytics.com
>> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>>
>>> but no failed count is shown for bolts
>>>
>>>
>>>
>>> *Best regards,*
>>>
>>> *K.Sai Dilip Reddy.*
>>>
>>> On Tue, May 3, 2016 at 11:19 AM, John Fang <xiaojian@alibaba-inc.com
>>> <javascript:_e(%7B%7D,'cvml','xiaojian@alibaba-inc.com');>> wrote:
>>>
>>>> Some tuples failed in the bolts. You can review the bolts' code. Maybe
>>>> your bolts' code trigger the fail() due to some reasons, Or the operation
>>>> of bolts need more time.
>>>>
>>>> --
>>>> 发件人：Sai Dilip Reddy Kiralam <dkira...@aadhya-analytics.com
>>>> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>>
>>>> 发送时间：2016年5月3日(星期二) 12:06
>>>> 收件人：user <user@storm.apache.org
>>>> <javascript:_e(%7B%7D,'cvml','user@storm.apache.org');>>
>>>> 主 题：Why tuples fail in spout
>>>>
>>>> Hi all,
>>>>
>>>>
>>>> I'm running storm topology in the local machine & storm UI shows some
>>>> tuples are failed in the spout.as per my knowledge spout tuples are
>>>> transferred to a bolts with out any failure.Can any of you help me out in
>>>> finding the reason of tuples failures in the spout.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *Best regards,*
>>>>
>>>> *K.Sai Dilip Reddy.*
>>>>
>>>>
>>>>
>>>
>>
>>
>

Re: Removed storm logs

2016-05-06 Thread Spico Florin

Hi!
  Of course yes. Try and check.
Regards,
 Florin

On Fri, May 6, 2016 at 11:10 AM, Matthew Lowe 
wrote:

> If you delete a worker log file that is in use, will the log be recreated
> the next time a log write is called?
>
> Best Regards

Re: If tuples come in too fast for bolts to process, does Zookeeper keep it in a queue?

2016-05-06 Thread Spico Florin

Hi!
  You welcome. nextTuple and the ack method are called in the same thread
by the framework. So if you have  heavy computation in the next tuple, your
ack method will never be called and the buffers that are responsible for
receiving the ack messages will not be emptied. The nextTuple acts as
producer for the these buffers while ack as a consumer.
I hope that these help.
 Regards,
 Florin

On Fri, May 6, 2016 at 11:25 AM, Navin Ipe <navin@searchlighthealth.com>
wrote:

> Thanks Florin. It does indeed seem to be a memory problem. Turns out that
> there were no ack's happening either because I was emitting from a while
> loop in nextTuple() and it never left the nextTuple() function.
>
> On Fri, May 6, 2016 at 11:59 AM, Spico Florin <spicoflo...@gmail.com>
> wrote:
>
>> Hello!
>>   If you have a look at the last line of your log you can see:
>> java.lang.OutOfMemoryError: *GC overhead limit exceeded*
>>  So you don't have enough memory for your worker. This is the reason
>> that the connection of the worker to ZoooKeper dies. The worker sends
>> heartbeats to ZK. If worker dies then no heartbeat to ZK. Therefore you
>> have connection timeout.
>> You can increase the JVM memory by setting up this via Config
>> property topology.worker.childopts . Config *conf.setProperty("*
>> * topology.worker.childopts**", "-Xms1G -Xmx1G").* This is to set up you
>> JVM heap memory.
>>
>>
>> To answer your question: *Does Zookeeper store a queue of unprocessed
>> tuples until the Bolts are ready to process them?*
>> *No.* Storm has internal queues to buffer the tuples. It is using a LMAX
>> disruptor queues to send/receive tuples from collocated JVM executors
>> (spouts and bolts) and different incoming/outgoing queues for
>> receiving/sending tuples to external workers (JVM).
>> Please have a detailed description here.
>> http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/
>>
>> I hope that these help.
>>  Regards,
>>  Florin
>>
>>
>>
>>
>>
>>
>> On Fri, May 6, 2016 at 8:22 AM, Navin Ipe <
>> navin@searchlighthealth.com> wrote:
>>
>>> Hi,
>>>
>>> I have a topology where if a spout emits 1 tuple, a Bolt-A takes that
>>> tuple and emits 20 tuples. The next Bolt-B takes Bolt-A's tuples and emits
>>> 50 more tuples for each of Bolt-A's tuples. Tuples are always anchored.
>>>
>>> *Question:*
>>> When a light-weight spout emits a few tuples and Bolt-B has to process
>>> an exponential number of tuples, Bolt-A and B will receive tuples faster
>>> than they can process it. Does Zookeeper store a queue of unprocessed
>>> tuples until the Bolts are ready to process them?
>>>
>>> *Reason I'm asking:*
>>> Because I get a session timeout when I run a single instance of the
>>> bolts. When I increase the parallelism and tasks to 5, it runs longer
>>> before timing out. When I increase it to 15, it runs even longer before
>>> timing out.
>>>
>>> *The various error messages:*
>>> 587485 [main-SendThread(localhost:2001)] INFO  o.a.s.s.o.a.z.ClientCnxn
>>> - Client session timed out, have not heard from server in 13645ms for
>>> sessionid 0x154846bbee3, closing socket connection and attempting
>>> reconnect
>>>
>>> 599655 [main-EventThread] INFO  o.a.s.s.o.a.c.f.s.ConnectionStateManager
>>> - State change: SUSPENDED
>>>
>>> 614868 [main-SendThread(localhost:2001)] INFO  o.a.s.s.o.a.z.ClientCnxn
>>> - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2001. Will
>>> not attempt to authenticate using SASL (unknown error)
>>>
>>> 614869 [main-SendThread(localhost:2001)] INFO  o.a.s.s.o.a.z.ClientCnxn
>>> - Socket connection established to localhost/0:0:0:0:0:0:0:1:2001,
>>> initiating session
>>>
>>> 607952 [SessionTracker] INFO  o.a.s.s.o.a.z.s.ZooKeeperServer - Expiring
>>> session 0x154846bbee3, timeout of 2ms exceeded
>>>
>>> 621928 [main-EventThread] WARN  o.a.s.c.zookeeper-state-factory -
>>> Received event :disconnected::none: with disconnected Writer Zookeeper.
>>>
>>> 627967 [Curator-Framework-0] WARN  o.a.s.s.o.a.c.ConnectionState -
>>> Connection attempt unsuccessful after 25535 (greater than max timeout of
>>> 2). Resetting connection and trying again with a new connection.
>>> 627967 [timer] INFO  o.a.s.c.healthcheck - ()
>>>
>>> 631511 [ProcessThread(sid:0 cport:-1):] INFO

Re: Storm topology using all the Max connections of db

2016-05-05 Thread Spico Florin

Hi!
   Well is seems that you have to contact the team that created the jdbc
storm project.  It seems that the issue is deeper than my knowledge.
I hope that my advises helped you. If you have a solution, please share it
here that the others can benefit from your experience.
Good luck!
  Regards,
   Florin

On Thu, May 5, 2016 at 1:57 PM, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> Hi,
>
> my supervisor logs shows the following error :
>
> 2016-05-05T15:45:15.365+0530 b.s.d.executor [INFO] Prepared bolt
> __acker:(67)
> 2016-05-05T15:45:15.369+0530 b.s.d.executor [INFO] Prepared bolt
> __system:(-1)
> 2016-05-05T15:45:18.481+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-0 is starting.
> 2016-05-05T15:45:18.486+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-1 is starting.
> 2016-05-05T15:45:18.486+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-3 is starting.
> 2016-05-05T15:45:18.488+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-2 is starting.
> 2016-05-05T15:45:18.488+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-3 is starting.
> 2016-05-05T15:45:18.489+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-4 is starting.
> 2016-05-05T15:45:18.489+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-5 is starting.
> 2016-05-05T15:45:18.490+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-6 is starting.
> 2016-05-05T15:45:18.490+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-7 is starting.
> 2016-05-05T15:45:18.490+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-8 is starting.
> 2016-05-05T15:45:18.490+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-9 is starting.
> 2016-05-05T15:45:18.491+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-10 is starting.
> 2016-05-05T15:45:18.491+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-11 is starting.
> 2016-05-05T15:45:18.492+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-12 is starting.
> 2016-05-05T15:45:18.493+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-13 is starting.
> 2016-05-05T15:45:18.493+0530 c.z.h.HikariDataSource [INFO] HikariCP pool
> HikariPool-4 is starting.
> 2016-05-05T15:45:19.310+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.311+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.312+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.312+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.314+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.314+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.314+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.313+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
> 2016-05-05T15:45:19.313+0530 c.z.h.p.HikariPool [ERROR] JDBC4
> Connection.isValid() method not supported, connection test query must be
> configured
>
> do postgres is using all the connections because this error ?
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>
> On Thu, May 5, 2016 at 10:57 AM, Sai Dilip Reddy Kiralam <
> dkira...@aadhya-analytics.com> wrote:
>
>> Hi,
>>
>> I'm closing up and opening the connections after execution of insert
>> query by my bolt.Do I need to increase the max connections db to a level of
>> 500 to 1000? Here in log file of supervisor it tells  me the connection
>> made by bolt  is closed and again taking new connection.I think after
>> closing it is going into ideal and not used for new connections.
>>
>> 2016-05-05T10:44:37.761+0530 STDIO [INFO] COMMAND TO CLOSE CONNECTION IS
>> DONE
>> 2016-05-05T10:44:37.761+0530 STDIO [INFO] DB connected
>>
>> if I'm right what is the solution for those ideal connections?
>>
>>
>>
>> *Best regards,*
>>
>> *K.Sai Dilip Reddy.*
>>
>> On Thu, May 5, 2016 at 1:01 AM, Spico Florin <spicoflo...@gmail.com>
>> wrote:
>&g

Re: Storm topology using all the Max connections of db

2016-05-04 Thread Spico Florin

hi!
 you have 9 bolts with 50 max db connections. so for each bolt you get a
conection pool. try to decresae this number for example to 5 and check if
your performance if fine with your db
regards,
 florin

On Wednesday, May 4, 2016, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> sorry ! my db is not yet started, so it given me the error ! but when I
> gave that statement and run the topology then it is using more connections
> than the specified number.
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>
> On Wed, May 4, 2016 at 11:22 AM, Sai Dilip Reddy Kiralam <
> dkira...@aadhya-analytics.com
> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>
>>
>> Hi,
>>
>> I just added
>>
>> a statement
>> * // config.setMaximumPoolSize(50);*
>> public synchronized void prepare() {
>> if(dataSource == null) {
>> Properties properties = new Properties();
>> properties.putAll(configMap);
>> HikariConfig config = new HikariConfig(properties);
>>   *  config.setMaximumPoolSize(50);*
>> this.dataSource = new HikariDataSource(config);
>> this.dataSource.setAutoCommit(false);
>> }
>> }
>>
>>
>> and I'm getting the following error
>>
>> java.lang.RuntimeException: Fail-fast during pool initialization at
>> com.zaxxer.hikari.pool.HikariPool.fillPool(HikariPool.java:475) at
>> com.zaxxer.hikari.pool.HikariPool.(HikariPool.java:159) at
>> com.zaxxer.hikari.pool.HikariPool.(HikariPool.java:112) at
>> com.zaxxer.hikari.HikariDataSource.(HikariDataSource.java:78) at
>> com.aail.config.MConnectionProvider.prepare(MConnectionProvider.java:27) at
>> com.aail.storm.bolts.AbstractJdbcBolt.prepare(AbstractJdbcBolt.java:34) at
>> com.aail.storm.bolts.Inserts.prepare(Inserts.java:290) at
>> backtype.storm.daemon.executor$fn__6647$fn__6659.invoke(executor.clj:692)
>> at backtype.storm.util$async_loop$fn__459.invoke(util.clj:461) at
>> clojure.lang.AFn.run(AFn.java:24) at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.postgresql.util.PSQLException: Connection attempt timed out.
>> at org.postgresql.Driver$ConnectThread.getResult(Driver.java:372) at
>> org.postgresql.Driver.connect(Driver.java:284) at
>> java.sql.DriverManager.getConnection(DriverManager.java:664) at
>> java.sql.DriverManager.getConnection(DriverManager.java:247) at
>> org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:99)
>> at
>> org.postgresql.ds.common.BaseDataSource.getConnection(BaseDataSource.java:82)
>> at com.zaxxer.hikari.pool.HikariPool.addConnection(HikariPool.java:398) at
>> com.zaxxer.hikari.pool.HikariPool.fillPool(HikariPool.java:474) ... 10 more
>>
>>
>>
>>
>> *Best regards,*
>>
>> *K.Sai Dilip Reddy.*
>>
>> On Wed, May 4, 2016 at 9:53 AM, Sai Dilip Reddy Kiralam <
>> dkira...@aadhya-analytics.com
>> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>>
>>> Hi,
>>>
>>> Thank you Spico Florin.
>>>
>>>
>>>
>>> *Best regards,*
>>>
>>> *K.Sai Dilip Reddy.*
>>>
>>> On Wed, May 4, 2016 at 1:18 AM, Spico Florin <spicoflo...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','spicoflo...@gmail.com');>> wrote:
>>>
>>>> hi!
>>>>   please have a look
>>>> https://github.com/apache/storm/tree/master/external/storm-jdbc where
>>>> you hve to implemnt thw interface connectionprovider interface. you have to
>>>> dins a third paty librry that provides conection pooloing for postgresql
>>>> and use that library to implement the interface connection provider. i hope
>>>> that other community member will give you some other ideas.
>>>>  i hope that these help.
>>>> regards,
>>>>  florin
>>>>
>>>> On Tuesday, May 3, 2016, Sai Dilip Reddy Kiralam <
>>>> dkira...@aadhya-analytics.com
>>>> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>>>>
>>>>> to implemI'm using storm jdbc for connecting to db.how can I use
>>>>> pooling on codes on topology.please share information on connection 
>>>>> pooling
>>>>> used on storm topologies.
>>>>>
>>>>>
>>>>>
>>>>> *Best regards,*
>>>>>
>>>>> *K.Sai Dilip Reddy.*
>>>>>
&g

Re: Storm topology using all the Max connections of db

2016-05-03 Thread Spico Florin

hi!
  please have a look
https://github.com/apache/storm/tree/master/external/storm-jdbc where you
hve to implemnt thw interface connectionprovider interface. you have to
dins a third paty librry that provides conection pooloing for postgresql
and use that library to implement the interface connection provider. i hope
that other community member will give you some other ideas.
 i hope that these help.
regards,
 florin

On Tuesday, May 3, 2016, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> to implemI'm using storm jdbc for connecting to db.how can I use pooling
> on codes on topology.please share information on connection pooling used on
> storm topologies.
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>
> On Thu, Apr 28, 2016 at 12:28 PM, Sai Dilip Reddy Kiralam <
> dkira...@aadhya-analytics.com
> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>
>> Hi Spico,
>>
>> I use 9 bolts with parallelism of 1 and with 1 task for each
>> bolt(default) and I'm not using any ConnectionPool for connecting to
>> postgres.Just using jdbc classe examples here is source example
>> http://storm.apache.org/releases/2.0.0-SNAPSHOT/storm-jdbc.html.
>> I will check by using the ConnectionPool.
>>
>> Thank you.
>>
>>
>>
>>
>> *Best regards,*
>>
>> *K.Sai Dilip Reddy.*
>>
>> On Thu, Apr 28, 2016 at 12:19 PM, Spico Florin <spicoflo...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','spicoflo...@gmail.com');>> wrote:
>>
>>> Hello!
>>>How many tasks do you have for inserting the data to your database?
>>> Are you using ConnectionPool for connecting to Postgres? If your number of
>>> task superseeds the number of max connections provided in connection pool
>>> then your have a problem.
>>> Please also check the number of max connections that your db accepts.
>>> I hope that these help.
>>> Regards,
>>>   Florin
>>>
>>> On Thu, Apr 28, 2016 at 7:09 AM, Sai Dilip Reddy Kiralam <
>>> dkira...@aadhya-analytics.com
>>> <javascript:_e(%7B%7D,'cvml','dkira...@aadhya-analytics.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have topology that make connection with postgresdb and insert the
>>>> fields into tables of a test db.my topology is working fine but when I
>>>> submit the topology it is establishing all the connections of db.but I
>>>> don’t know why it is taking all the max connections.
>>>>
>>>>
>>>> below attached the pics of pg_stat_activity.
>>>>
>>>>
>>>> *Best regards,*
>>>>
>>>> *K.Sai Dilip Reddy.*
>>>>
>>>
>>>
>>
>

Re: Why tuples fail in spout

2016-05-03 Thread Spico Florin

Hi!
  It could be that your process time for that tuples is greater than
topology.message.timeout.secs
30 seconds. In these case have a look at your bolts processing time or how
fast you generate your data from spout that could not cope with your bolts.
You can:
1. Increase your level your parallelism for your bolts.
2. Throttling the messages from the spouts by setting out
topology.max.spout.pending
Please have a look at these questions:
http://stackoverflow.com/questions/32322682/apache-storm-what-happens-to-a-tuple-when-no-bolts-are-available-to-consume-it

http://stackoverflow.com/questions/26536525/what-is-the-point-of-timing-out-tuples

I hope that these help.
 Regards,
 Florin

On Tue, May 3, 2016 at 9:42 AM, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> but no failed count is shown for bolts
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>
> On Tue, May 3, 2016 at 11:19 AM, John Fang 
> wrote:
>
>> Some tuples failed in the bolts. You can review the bolts' code. Maybe
>> your bolts' code trigger the fail() due to some reasons, Or the operation
>> of bolts need more time.
>>
>> --
>> 发件人：Sai Dilip Reddy Kiralam 
>> 发送时间：2016年5月3日(星期二) 12:06
>> 收件人：user 
>> 主 题：Why tuples fail in spout
>>
>> Hi all,
>>
>>
>> I'm running storm topology in the local machine & storm UI shows some
>> tuples are failed in the spout.as per my knowledge spout tuples are
>> transferred to a bolts with out any failure.Can any of you help me out in
>> finding the reason of tuples failures in the spout.
>>
>>
>>
>>
>>
>>
>> *Best regards,*
>>
>> *K.Sai Dilip Reddy.*
>>
>>
>>
>

Re: Storm topology using all the Max connections of db

2016-04-28 Thread Spico Florin

Hello!
   How many tasks do you have for inserting the data to your database? Are
you using ConnectionPool for connecting to Postgres? If your number of task
superseeds the number of max connections provided in connection pool then
your have a problem.
Please also check the number of max connections that your db accepts.
I hope that these help.
Regards,
  Florin

On Thu, Apr 28, 2016 at 7:09 AM, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

> Hi,
>
> I have topology that make connection with postgresdb and insert the fields
> into tables of a test db.my topology is working fine but when I submit the
> topology it is establishing all the connections of db.but I don’t know why
> it is taking all the max connections.
>
>
> below attached the pics of pg_stat_activity.
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>

Re: Need Help

2016-04-27 Thread Spico Florin

Hi!
In Storm UI, please have a look at the value that you get for the Capacity
(last 10m) for your bolt. If it closer If this value  close to 1, then the
bolt is “at capacity”and is a bottleneck in your topology.Address such
bottlenecks by increasing the parallelism of the “at-capacity” bolts. This
means you should increase the num executors for this bolt, by providing the
hint parallelism in the bolt declarer or set up a numTask for this bolt to
16 and then rebalance your topology by providinng the
storm rebalance  -e  (max 16).
I hope that these help.
Regards,
 Florin

On Wed, Apr 27, 2016 at 10:54 AM, Sai Dilip Reddy Kiralam <
dkira...@aadhya-analytics.com> wrote:

>
> Hi All,
>
> I'm running a topology in local machine.my bolts parse the json data and
> insert the parsed fields into db.one of the bolt is turning into red
> color.What does it mean.
> does it indicate flow of tuples is more for that insert bolt? if yes!!
> .what should I do?Do I need it increase the parallelism of that particular
> bolt?
>
>
>
>
> *Best regards,*
>
> *K.Sai Dilip Reddy.*
>

Re: Storm 1.0.0 DRPC connection refused

2016-04-19 Thread Spico Florin

Hello!
  I found also a post with similliar error that you have. Perhaps you get
some clues.
http://mail-archives.apache.org/mod_mbox/storm-user/201603.mbox/%3c0dd9aa99-8504-43c9-b3a8-6196def07...@viaplay.com%3E


On Tue, Apr 19, 2016 at 2:25 PM, Spico Florin <spicoflo...@gmail.com> wrote:

> Hi!
>   I suggest to run your topology with LocalCluster and LocalDRPC.
> For me it seems that is related with some security added in the 1.0.0. Can
> you check some security configuration parameters that you have to set up in
> your storm.yaml?
>
> at
> org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64)
> ~[storm-core-1.0.0.jar:1.0.0]
>
> at
> org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56)
> ~[storm-core-1.0.0.jar:1.0.0]
>
>
>
> I hope that these help.
>
>  Regards,
>
>  Florin
>
>
> On Tue, Apr 19, 2016 at 11:04 AM, Victor Kovrizhkin <
> vik.kovrizh...@gmail.com> wrote:
>
>> Please help!
>>
>> From: Victor Kovrizhkin
>> Date: Monday, April 18, 2016 at 9:28 PM
>> To: "user@storm.apache.org"
>> Subject: Storm 1.0.0 DRPC connection refused
>>
>> Hi Good People!
>>
>> I’m trying to update my cluster running Storm 0.10.0 with DRPC to Storm
>> 1.0.0. I’ve updated all machines with latest version of Storm, changed
>> storm.yml configurations (e.g. nimbus.host -> nimbus.seeds), changed
>> dependencies in topology. When I starting nimbus, supervisors, ui, drpc,
>> logviewers everything looks good, I’m able to see UI, logs, configuration
>> and submit topologies.
>>
>> But once I submit DRPC topology – it’s logs full of following exceptions:
>>
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
>> java.lang.RuntimeException:
>> org.apache.storm.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection refused
>>
>> at org.apache.storm.drpc.DRPCSpout.checkFutures(DRPCSpout.java:129)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at org.apache.storm.drpc.DRPCSpout.nextTuple(DRPCSpout.java:210)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at
>> org.apache.storm.daemon.executor$fn__8158$fn__8173$fn__8204.invoke(executor.clj:647)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at org.apache.storm.util$async_loop$fn__554.invoke(util.clj:484)
>> [storm-core-1.0.0.jar:1.0.0]
>>
>> at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
>>
>> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
>>
>> Caused by: java.util.concurrent.ExecutionException:
>> java.lang.RuntimeException:
>> org.apache.storm.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection refused
>>
>> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> ~[?:1.8.0_74]
>>
>> at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_74]
>>
>> at org.apache.storm.drpc.DRPCSpout.checkFutures(DRPCSpout.java:127)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> ... 5 more
>>
>> Caused by: java.lang.RuntimeException:
>> org.apache.storm.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection refused
>>
>> at
>> org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at
>> org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at
>> org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:99)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at
>> org.apache.storm.security.auth.ThriftClient.(ThriftClient.java:69)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at
>> org.apache.storm.security.auth.ThriftClient.(ThriftClient.java:46)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at
>> org.apache.storm.drpc.DRPCInvocationsClient.(DRPCInvocationsClient.java:40)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at org.apache.storm.drpc.DRPCSpout$Adder.call(DRPCSpout.java:101)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at org.apache.storm.drpc.DRPCSpout$Adder.call(DRPCSpout.java:88)
>> ~[storm-core-1.0.0.jar:1.0.0]
>>
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_74]
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> ~[?:1.8.0_74]
>>
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> ~[?:1.8.0_74]
>>
>> ..

Re: DRPCExecutionException(msg:Request failed)

2016-04-18 Thread Spico Florin

Hi!
  Have you started the DRPC server?
Please have a look at:
http://stackoverflow.com/questions/23693871/storm-basicdrpc-client-execute

I hope that this help.
 Florin

On Mon, Apr 18, 2016 at 2:18 AM, sam mohel  wrote:

> i got this error when i submitted topology with localhost 127.0.0.1
>
> Exception in thread "main" DRPCExecutionException(msg:Request failed)
> at
> backtype.storm.generated.DistributedRPC$execute_result.read(DistributedRPC.java:904)
> at
> org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:78)
> at
> backtype.storm.generated.DistributedRPC$Client.recv_execute(DistributedRPC.java:92)
> at
> backtype.storm.generated.DistributedRPC$Client.execute(DistributedRPC.java:78)
> at backtype.storm.utils.DRPCClient.execute(DRPCClient.java:71)
>
> is that mean to increase time for drpc ? like drpc.request.timeout.secs or
> something else ?
>
> Thanks for any help
>

Fwd: Executor inbound buffer queue(receive) and sender buffer queue when use collector.emit(streamId,tupleValues)

2016-04-17 Thread Spico Florin

Hello!
  I would like to know how Storm manages the internal buffer queues when
using the collector.emit(streamId,ValusToEmit)?
For example considering the topology
1. Spout->ProcessingBolt
2. spout.collector.emit(streamId1, tupleValues1)
spout.collector.emit(streamId2, tupleValues2)

Q. How the buffer send queue look for Spout?
Q. How the receive buffer queue look like for ProcessingBolt?
Q. Does it use internally a Map ( queue per each
stream) or a single LMAXQueue that handles the input/outputs for all the
streams?
I look forward for your answers.
 Regards,
 Florin

Re: initState method not invoked in Storm 1.0

2016-04-15 Thread Spico Florin

Hi!
  Thank you for your answers. Yes I will provide the hint parallelism.
Regarding, the constraint that I'm facing should I emit from both spout and
stateful bolt anchored tuples (that way the the exception was complaining)?
I look forward for your answers,
 Florin

On Fri, Apr 15, 2016 at 12:16 PM, Arun Mahadevan <ar...@apache.org> wrote:

> Ah, I see what you mean. The “setBolt” method without parallelism hint is
> not overloaded for stateful bolts so if parallelism hint is not specified
> it ends up as being normal bolt. Will raise a JIRA for fixing this.
>
> Spico,
>
> For now, can you provide parallelism hint while you add stateful bolt to
> the topology ?
>
> Thanks,
> Arun
>
> From: Alexander T
> Reply-To: "user@storm.apache.org"
> Date: Friday, April 15, 2016 at 2:38 PM
>
> To: "user@storm.apache.org"
> Subject: Re: initState method not invoked in Storm 1.0
>
> Hi Arun,
>
> I meant that it's very easy to use the wrong setBolt method overload by
> mistake, since stateful bolts are supertypes of stateless ones.
>
> Regards
> Alex
> On Apr 15, 2016 10:54 AM, "Arun Mahadevan" <ar...@apache.org> wrote:
>
> Its the same method (builder.setBolt) that adds stateful bolts to a
> topology. Heres an example -
> https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/storm/starter/StatefulTopology.java
>
> Spico,
>
> Do you see any errors in the logs ? You might want to turn on debug logs
> and see whats happening. Can you also try running the StatefulTopology in
> the storm-starter and check if you see the same behavior ?
>
> Thanks,
> Arun
>
> From: Alexander T
> Reply-To: "user@storm.apache.org"
> Date: Friday, April 15, 2016 at 2:06 PM
> To: "user@storm.apache.org"
> Subject: Re: initState method not invoked in Storm 1.0
>
> Hi Spico,
>
> Are you adding your bolt to the topology with the special methods for
> stateful bolts? It's quite easy to use the regular addBolt method and it
> will in that case be treated as a stateless one.
>
> Cheers
> Alex
> On Apr 15, 2016 10:33 AM, "Spico Florin" <spicoflo...@gmail.com> wrote:
>
>> Hello!
>>   I'm running a topology in LocalCluster that has a stasteful Bolt. Wile
>> debugging, I have observed that the initState method is not invoked at all.
>> The documentation said:
>> "The initState method is invoked by the framework during the bolt
>> initialization with the previously saved state of the bolt. This is invoked
>> after prepare but before the bolt starts processing any tuples".
>>
>> Due to this, the state field remains null and I get NPE when I populate
>> it with state .put
>> Any idea why the initState is not invoked?
>> Regards,
>>  Florin
>>
>> Here is my code:
>>
>> public class TimeSeriesStatefulBolt extends
>> BaseStatefulBolt<KeyValueState<Long, Map<String, Float>>> {
>>
>> private KeyValueState<Long, Map<String, Float>> state;
>> @Override
>> public void initState(KeyValueState<Long, Map<String, Float>> state) {
>> this.state = state;
>> }
>>
>

Re: initState method not invoked in Storm 1.0

2016-04-15 Thread Spico Florin

Hi, Alex!
  There is no special method addBolt in the API TopologyBuilder for . I
have used the topologyBuilder.setBolt("myId", new TimeSeriesStatefulBolt())
without parallelism_hint declared and the API dismiss the statefulness.
I did NOT get any errors in the Logs.

 What I have found is the  topologyBuilder.setBolt(String id, IStatefulBolt
bolt, Number parallelism_hint) is considering the statefulness of the bolt,
but now I'm facing a different issue that seems to be related with some
constraints of the stareful bolts:

java.lang.RuntimeException: java.lang.UnsupportedOperationException: Bolts
in a stateful topology must emit anchored tuples.

Thank you for your support.
 Florin

On Fri, Apr 15, 2016 at 11:36 AM, Alexander T <mittspamko...@gmail.com>
wrote:

> Hi Spico,
>
> Are you adding your bolt to the topology with the special methods for
> stateful bolts? It's quite easy to use the regular addBolt method and it
> will in that case be treated as a stateless one.
>
> Cheers
> Alex
> On Apr 15, 2016 10:33 AM, "Spico Florin" <spicoflo...@gmail.com> wrote:
>
>> Hello!
>>   I'm running a topology in LocalCluster that has a stasteful Bolt. Wile
>> debugging, I have observed that the initState method is not invoked at all.
>> The documentation said:
>> "The initState method is invoked by the framework during the bolt
>> initialization with the previously saved state of the bolt. This is invoked
>> after prepare but before the bolt starts processing any tuples".
>>
>> Due to this, the state field remains null and I get NPE when I populate
>> it with state .put
>> Any idea why the initState is not invoked?
>> Regards,
>>  Florin
>>
>> Here is my code:
>>
>> public class TimeSeriesStatefulBolt extends
>> BaseStatefulBolt<KeyValueState<Long, Map<String, Float>>> {
>>
>> private KeyValueState<Long, Map<String, Float>> state;
>> @Override
>> public void initState(KeyValueState<Long, Map<String, Float>> state) {
>> this.state = state;
>> }
>>
>

Re: Need for capacity planning suggestions for setting up KAFKA - STORM cluster in AWS

2016-04-07 Thread Spico Florin

Hi!
 Well is not only about memory. Is about also about availability, failover,
if your processing is CPU intensive and also the velocity and the volume of
data that you ingest.
Florin

On Mon, Apr 4, 2016 at 7:26 AM, researcher cs 
wrote:

> I have same question . what if data is a round 12 GB
>
>
> On Monday, April 4, 2016, Sai Dilip Reddy Kiralam <
> dkira...@aadhya-analytics.com> wrote:
>
>> That depends on the data amount your working with.
>>
>>
>>
>> *Best regards,*
>>
>> *K.Sai Dilip Reddy,*
>>
>>
>> *Software Engineer - Hadoop Trainee,2-39, Old SBI Road, Sri Nagar
>> Colony, Gannavaram - 521101.*
>> *www.aadhya-analytics.com .*
>> *[image: Inline image 1]*
>>
>> On Sat, Mar 26, 2016 at 1:10 PM, sujitha chinnu <
>> chinnusujith...@gmail.com> wrote:
>>
>>> Hai All,
>>>
>>>   I'm collecting twitter data in my local machine in single node
>>> (linux-ubuntu) cluster using storm topology. Now I want to keep this in
>>> production by buying AWS servers. So I need suggestions on capacity
>>> planning for setting up KAFKA-STORM cluster.
>>>
>>> Can anyone suggest me the memory utilizations for the following :
>>>
>>> 1. How much memory space should I allocate to ZOOKEEPER cluster ?
>>>
>>> 2. How much memory space should I allocate to SUPERVISOR & NIMBUS nodes ?
>>>
>>> 3. How much memory space should I allocate to KAFKA cluster ?
>>>
>>
>>

Re: Storm 1.0.0 Windowing by id

2016-04-02 Thread Spico Florin

Hi!
   Depending what do you mean by and if that Id is part of the tuple. If
you use for example different stream id for your clients, the answer for
this post

https://community.hortonworks.com/questions/24068/storm-window-support-for-streams.html

 gave the idea that you have different windows for each streamId.
  I hope that it helps.
  Regards,
 Florin

On Fri, Apr 1, 2016 at 8:17 PM, Filipa Duarte 
wrote:

> Hi!
>
>
>
> With regard to the new windowing features of Storm 1.0.0, is it possible
> to group the windows by  id, so that the events coming from different
> clients are processed by different windows?
>
> Thank you.
>
>
> Regards,
>
> Filipa
>
>
>

Re: Storm KafkaSpout Integration

2016-03-30 Thread Spico Florin

hi,
i think the problem that you have is that you have stup one partition per
topic, but you try to conume with 10 kafka task spouts.
check this lines builder.setSpout("words", new KafkaSpout(kafkaConfig), 10);
10 represents the task parslellism for the spout, that shoul be in the case
of kafka the same number as the partition you have  setup for kafka topic.
you use more than one kafka partition when you would like to consume in
parallel the data from the topic. please check the very good documentation
on ksfka partition on confluent site.
in my opinon, set up your hint parallelism to 1 would solve the problem.
tne max spout pending has a different meaning.
regards,
florin

On Wednesday, March 30, 2016, david kavanagh  wrote:
> I am only creating one partition in code here:
>  GlobalPartitionInformation hostsAndPartitions = new
GlobalPartitionInformation();
>  hostsAndPartitions.addPartition(0, new Broker("127.0.0.1", 9092));
>  BrokerHosts brokerHosts = new StaticHosts(hostsAndPartitions);
> I hope that answered your question. I am new to both Storm and Kafka so i
am not sure exactly how it works.
> If i am understanding you correctly, the line you told me to add in the
first email should work because i am only creating one partition?
> config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);
> Thanks again for the help :-)
> David
> 
> Date: Wed, 30 Mar 2016 15:36:19 +0530
> Subject: Re: Storm KafkaSpout Integration
> From: dkira...@aadhya-analytics.com
> To: user@storm.apache.org
>
>
> Hi david,
>
> Can I know how many partitions you are having?
> statement I have given to you is default.if you are  running with no of
partitions make sure you give same number eg: if you are running with two
partitions change the number to 2 in the statement .
> config.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,2 );
>
> Best regards,
> K.Sai Dilip Reddy.
> On Wed, Mar 30, 2016 at 3:00 PM, david kavanagh 
wrote:
>
> Thanks for the reply!
> I added the line as you suggested but there is still no difference
unfortunately.
> I am just guessing at this stage but judging by the output below it, it
seems like it is something to do with the partitioning or the offset.
> The warnings start by staying that  there are more tasks than partitions.
> Task 1 is assigned the partition that is created in the code (highlighted
in green), then the rest of the tasks are not assigned any partitions.
> Eventually is states 'Read partition information from:
/twitter/twitter-topic-id/partition_0  --> null'
> So it seems like it is not reading data from Kafka at all. I really don't
understand what is going on here.
> Any ideas?
>
> Kind Regards
> David
> --
> Storm Output:
> Thread-9-print] INFO  backtype.storm.daemon.executor - Prepared bolt
print:(2)
> 32644 [Thread-11-words] INFO
 org.apache.curator.framework.imps.CuratorFrameworkImpl - Starting
> 32685 [Thread-19-words] WARN  storm.kafka.KafkaUtils - there are more
tasks than partitions (tasks: 10; partitions: 1), some tasks will be idle
> 32686 [Thread-19-words] WARN  storm.kafka.KafkaUtils - Task [5/10] no
partitions assigned
> 32686 [Thread-17-words] WARN  storm.kafka.KafkaUtils - there are more
tasks than partitions (tasks: 10; partitions: 1), some tasks will be idle
> 32686 [Thread-15-words] WARN  storm.kafka.KafkaUtils - there are more
tasks than partitions (tasks: 10; partitions: 1), some tasks will be idle
> 32686 [Thread-17-words] WARN  storm.kafka.KafkaUtils - Task [4/10] no
partitions assigned
> 32686 [Thread-15-words] WARN  storm.kafka.KafkaUtils - Task [3/10] no
partitions assigned
> 32686 [Thread-11-words] WARN  storm.kafka.KafkaUtils - there are more
tasks than partitions (tasks: 10; partitions: 1), some tasks will be idle
> 32686 [Thread-11-words] INFO  storm.kafka.KafkaUtils - Task [1/10]
assigned [Partition{host=127.0.0.1:9092, partition=0}]
> 32687 [Thread-29-words] WARN  storm.kafka.KafkaUtils - there are more
tasks than partitions (tasks: 10; partitions: 1), some tasks will be idle
> 32697 [Thread-19-words-EventThread] INFO
 org.apache.curator.framework.state.ConnectionStateManager - State change:
CONNECTED
> 32697 [Thread-25-words-EventThread] INFO
 org.apache.curator.framework.state.ConnectionStateManager - State change:
CONNECTED
> 32697 [Thread-29-words-EventThread] INFO
 org.apache.curator.framework.state.ConnectionStateManager - State change:
CONNECTED
> 32697 [Thread-13-words-EventThread] INFO
 org.apache.curator.framework.state.ConnectionStateManager - State change:
CONNECTED
> 32697 [Thread-27-words-EventThread] INFO
 org.apache.curator.framework.state.ConnectionStateManager - State change:
CONNECTED
> 32697 [Thread-15-words-EventThread] INFO
 org.apache.curator.framework.state.ConnectionStateManager - State change:
CONNECTED
> 32689 [Thread-19-words] INFO  backtype.storm.daemon.executor - Opened
spout words:(7)
> 32689 [Thread-25-words] WARN

Re: Combining group by and time window

2016-03-30 Thread Spico Florin

hello!
from storm perspective, regarding window functionality, storm 1.0 will add
the implementation for window bolt. there is a verry good article regarding
on hortonwork what kind of functionality is provided. please have a look at
https://community.hortonworks.com/articles/14171/windowing-and-state-checkpointing-in-apache-storm.html
i hope that it helps.
regards, florin
On Wednesday, March 30, 2016, Maria Musterfrau  wrote:
> Does anyone have an idea?
>
> Thank you in advance.
>
> Regards,
> Daniela
>
> Gesendet: Montag, 28. März 2016 um 21:06 Uhr
> Von: "Maria Musterfrau" 
> An: user@storm.apache.org
> Betreff: Combining group by and time window
> Hi,
>
> I have a stream with time series data from different regions. I would
like to group the stream by the different regions and to add up the values
of the last minute (time window) per region. The sums should be persisted
to Redis or something like this.
>
> I already found out that Storm Trident provides a group by function to
split the stream. I think this could be useful.
> Storm core provides time windows, so I could use it for the aggregation.
>
> But how can I combine these two components? Or is this not possible?
>
> Would it be useful to do the grouping already in Kafka (with different
topics) or is it better to do it in Storm
>
> Thank you in advance.
>
> Regards,
> Daniela

Executor inbound buffer queue(receive) and sender buffer queue when use collector.emit(streamId,tupleValues)

2016-03-28 Thread Spico Florin

Hello!
  I would like to know how Storm manages the internal buffer queues when
using the collector.emit(streamId,ValusToEmit)?
For example considering the topology
1. Spout->ProcessingBolt
2. spout.collector.emit(streamId1, tupleValues1)
spout.collector.emit(streamId2, tupleValues2)

Q. How the buffer send queue look for Spout?
Q. How the receive buffer queue look like for ProcessingBolt?
Q. Does it use internally a Map ( queue per each
stream) or a single LMAXQueue that handles the input/outputs for all the
streams?

I look forward for your answers.
 Regards,
 Florin

Usage of G1 Garbage collector

2016-03-15 Thread Spico Florin

Hello!
 I would like the community  the following:
1. Are you using the G1 garbage collector for your workers/supervisors  in
production?
2. Have you observed any improvement added by adding this GC style?
3. What are the JVM options that you are using and are a good fit for you?

Thank you in advance.
 Regards,
 Florin

Best practices for running Storm, HBase, Kafka (regading Zookeeper cluster)

2016-03-02 Thread Spico Florin

Hello!
 I would like to know how it is best to run the three systems regarding the
Zookeeper cluster usage:
 1. separately per each system (ZK cluster/Storm, Zk Cluster/HBase,
Zk/Hbase)
2. single cluster for all of them (Storm,HBase,Kafka->single ZK cluster)

In my opinion the first one is the best option (since each system uses
differently the ZK and we could have performance and availability issues).

I look forward forward for your advises.
  Regards,
 Florin

order of execution topologies

2016-02-29 Thread Spico Florin

hello!
when all the free slots are occupied and you are still submitting the
topologies what will be the order of these holded topologies when the
existing one

How to get/see the thrift counterpart of a topology

2016-02-29 Thread Spico Florin

Hello!
 I would like to know how can I get/see how a topology structure was packed
 for Thrift protocol.
More specific I would like to see the content of ComponentObject and
ComponentCommon, and whatever information is sent to nimbus.

As far as I know (please correct me, if I'm wrong) there are two
counterparts of the topology that are sent to nimbus
 - the fat jar that contains the classed and their dependencies (sent via
Thrift??)
- the topology structure as Thrift structures (sent also via Thrift).

As I said, I'm interested in the second point.
I look forward for your answers.
  Regards,
  Florin

Storm-Kafka data locality

2016-02-23 Thread Spico Florin

Hello!

My use case is the to send 100 MB of raw data to Kafka, consuming from
 StormKafkaSpout.
I would like to ask you if co-locating the StormKafkaSpout(for consuming)
or StormKafkaProducer(for producing) with the Kakfa partitions where the
data resides is a good practice when designing a Storm-Kafka scenario?

I look forward for your answers.
 Regards,
 Florin

Re: localOrShuffleGrouping load balanced tuple distribution

2016-02-16 Thread Spico Florin

Hi, Nathan!
  Thank you for your answer. You are right: the only bolts that are used
will be the ones located nearby the spout. The other two will be unused. If
we increase the  number of workers to 5, then it will behave as
shuffleGrouping (distributing evenly among the bolts).
I got it!
 Regards,
 Florin

On Mon, Feb 15, 2016 at 3:56 PM, Nathan Leung <ncle...@gmail.com> wrote:

> It will use the local bolts.
>
> On Mon, Feb 15, 2016 at 8:32 AM, Spico Florin <spicoflo...@gmail.com>
> wrote:
>
>>
>> Hello!
>>Suppose that I have the following scenario
>> 1. One spout
>> 2. One bolt with hintParallelism set to 4
>> 3. Bolt connected with the spout with localOrShuffleGrouping
>> 4. 2 slots available
>> 5. We use the the default schedulder (round-robin)
>>
>> Give the above scenario, the distribution over the workers will be:
>> worker 1: spout, bolt, bolt
>> worker 2: bolt, bolt
>>
>> Here are my questions:
>> 1. Will the spout uses the 4 bolts?
>> 2. Will the data distributed evenly between these bolts?
>> 3. Due to localorShuffleGrouping method, only the local bolts will be
>> used (the ones from the worker 1) or all the workers?
>>
>> I look forward for your answers.
>> Thank you.
>>  Florin
>>
>>
>>
>>
>>
>

Fwd: localOrShuffleGrouping load balanced tuple distribution

2016-02-15 Thread Spico Florin

Hello!
   Suppose that I have the following scenario
1. One spout
2. One bolt with hintParallelism set to 4
3. Bolt connected with the spout with localOrShuffleGrouping
4. 2 slots available
5. We use the the default schedulder (round-robin)

Give the above scenario, the distribution over the workers will be:
worker 1: spout, bolt, bolt
worker 2: bolt, bolt

Here are my questions:
1. Will the spout uses the 4 bolts?
2. Will the data distributed evenly between these bolts?
3. Due to localorShuffleGrouping method, only the local bolts will be used
(the ones from the worker 1) or all the workers?

I look forward for your answers.
Thank you.
 Florin

localOrShuffleGrouping load balanced tuple distribution

2016-02-15 Thread Spico Florin

Hello!
   Suppose that I have the following scenario
1. One spout
2. One bolt with hintParallelism set to 4
3. Bolt connected with the spout with localOrShuffleGrouping
4. 2 slots available
5. We use the the default schedulder (round-robin)

Give the above scenario, the distribution over the workers will be:
worker 1: spout, bolt, bolt
worker 2: bolt, bolt

Here are my questions:
1. Will the spout uses the 4 bolts?
2. Will the data distributed evenly between these bolts?
3. Due to localorShuffleGrouping method, only the local bolts will be used
(the ones from the worker 1) or all the workers?

I look forward for your answers.
Thank you.
 Florin

Re: The best way to unit test a Bolt with dependency?

2016-02-09 Thread Spico Florin

Hello!
  Set RabbitMQUtils as transient field in order to get rid of serialization
error. See if you can mock (Mockito using)  your RabbitMQ utils and inject
it via setMQ.
Hope that these help.
 Regards,
 Florin

On Tue, Jan 19, 2016 at 7:16 PM, Noppanit Charassinvichai <
noppani...@gmail.com> wrote:

> I'm trying to unit test my Bolt which has dependency to RabbitMQ.
>
> This is what I put in my prepare method.
>
> @Override
> public void prepare(Map map, TopologyContext topologyContext,
> OutputCollector outputCollector) {
> this.outputCollector = outputCollector;
> this.gson = new Gson();
> this.rabbitMQUtils = new RabbitMQUtils(this.config);
> }
>
> If I change that to inject RabbitMQUtils I get an Exception telling me
> that some classes in RabbitMQ API cannot be serialized.
>
> What do people do to unit test something like this?
>
> Thanks.
>

Re: running multiple topologies in same cluster

2016-02-04 Thread Spico Florin

Hello!
Thank you all for your answers! I guess I'll wait for adding in support for
resource aware scheduling in a multi-tenant stand alone storm cluster.
Regards,
 Florin

On Tue, Feb 2, 2016 at 1:00 AM, Erik Weathers <eweath...@groupon.com> wrote:

> hi Spico,
>
> As Bobby said, native Storm is going to have better support for this soon.
>
> FWIW, there is also the storm-on-mesos project, which we've been running
> on for almost 2 years at Groupon.
>
>- https://github.com/mesos/storm
>
> Caveats:
>
>- storm's logviewer is unsupported
>- scheduling can be suboptimal, causing topologies of different "size"
>(resource requirements) to starve each other
>   - side effect of needing to dynamically calculate storm "slots"
>   from mesos resource offers, my team has a framework change we are 
> testing
>   that will improve that behavior.
>- it's relatively complex to operate since there are so many different
>moving parts / components
>   - mesos-master
>   - mesos-slave/agent (it's being renamed)
>   - storm nimbus (MesosNimbus -> the mesos scheduler)
>   - storm supervisor (MesosSupervisor -> the mesos executor)
>   - storm worker (the mesos task)
>   - ZooKeeper
>- it probably won't work nicely with all the fancy security stuff that
>has been added to Storm in 0.10.0+
>
> - Erik
>
> On Mon, Feb 1, 2016 at 12:14 PM, Bobby Evans <ev...@yahoo-inc.com> wrote:
>
>> We are currently adding in support for resource aware scheduling in a
>> multi-tenant stand alone storm cluster.  It is still alpha quality but we
>> plan on getting it into production at Yahoo this quarter.  If you can wait
>> that would be the preferred way I see to support your use case.
>>
>> - Bobby
>>
>>
>> On Monday, February 1, 2016 12:16 PM, Spico Florin <spicoflo...@gmail.com>
>> wrote:
>>
>>
>> Hello!
>> I have an use case where we have to deploy many tpologies in a storm
>> cluster.
>> 1.we would like to know if running these topologies in combination with
>> apache slider over yarn would bring us some benefits in terms of resource
>> consumption?
>> 2. in such cases (running many topolgies aprox 60) what are the best
>> practices on how to run them over a cluster with a smart load balanced
>> hardware resources consumption (cpu, ram)?
>> I look forward for your answers.
>> Regards,
>> Florin
>>
>>
>>
>>
>

Big fat jar

2016-02-04 Thread Spico Florin

Hello!
  After building my project that comtaint the topology, I have a big fat
jar of 75MB. I have dependencies on HBase, OpenTSDB and Kafka. I would like
to reduce the size of the the jar due to the fact that we can a lot of
instances of the topology running (aprox 100).

I have read in two posts:
http://qnalist.com/questions/4712134/is-there-a-way-to-add-a-custom-jar-or-directory-of-jars-to-the-storm-classpath-without-copying-the-jar-s-to-storm-lib-folder
and
http://programmers.stackexchange.com/questions/238711/why-does-storm-not-supply-a-mechanism-for-supplying-topology-necessary-dependent

I found two solutions.
One is to put the commons libararies in Storm_install_dir/lib   and the
second one is to put them in folder USER_CONF_DIR defined in storm.yaml. In
both acases we should take care to keep a single version of dependent
libraries. This could be a potential issue, due to the fact that their
might be common libraries that can interfere with storm libraries (such as
zookeeper or appache-commons).
 Also, I'm running Storm from Ambari.

In the above scenario:
- what could be the best solution to keep the common dependent libraries?
- If I configure the USER_CONF_DIR in Ambari will it be propagated to all
cluster machines?

I look forward for your answers.
Thanks .
 Florin

running multiple topologies in same cluster

2016-02-01 Thread Spico Florin

Hello!
I have an use case where we have to deploy many tpologies in a storm
cluster.
1.we would like to know if running these topologies in combination with
apache slider over yarn would bring us some benefits in terms of resource
consumption?
2. in such cases (running many topolgies aprox 60) what are the best
practices on how to run them over a cluster with a smart load balanced
hardware resources consumption (cpu, ram)?
I look forward for your answers.
Regards,
Florin

Status of REST API in 0.10.x versions

2015-11-26 Thread Spico Florin

Hello!
  I would like to ask you what is the status of the REST API in 0.10.x
versions for the followings:
1. submitting a topology
2. killing a topology
3. listing a topology
I have read something about these features in
https://github.com/apache/storm/pull/464 and
https://issues.apache.org/jira/browse/STORM-615, but is not quite clear if
they are already provided or no.
If they are ready can you provide some examples on how to use them?
Thank you,
  Best regards,
  Florin

Storrm worker issue .daemon.supervisor still hasn't started when using apostrophe char in launcher command

2015-07-21 Thread Spico Florin

Using Storm on Windows OS cluster

2015-07-15 Thread Spico Florin

Hello!
 I would lie to ask you the following:
1. Is anyone using the Storm deployed on a Windows OS cluster (multi node
Windows OS based machines)?
2. If yes is only for testing purpose or also production mode?
I found a discussion about using Storm on Windows cluster here:
http://ptgoetz.github.io/blog/2013/12/18/running-apache-storm-on-windows/
but unfortunately I could not understand if Storm is supporting this
multi-node approach (the example is just for one single node).

Also another question is related with Zookeeper cluster. If you have used
Storm on Windows did you use also Zookeeper on Windows? Is is known that
for Windows like OS Zookeeper is supported only for development purpose not
for production:
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html#sc_supportedPlatforms

I look forward for your answers.
Thanks.
 Florin

Re: Best Spout implementation for Reading input Data From File

2015-06-08 Thread Spico Florin

Hello!
  You can also have a look at this post:
http://stackoverflow.com/questions/24413088/storm-max-spout-pending. It
might be helpful.
Regards,
 Florin

On Sun, Jun 7, 2015 at 4:17 PM, Nathan Leung ncle...@gmail.com wrote:

 You should emit with a message id, which will prevent too many messages
 from being in flight simultaneously, which will alleviate your out of
 memory conditions.
 On Jun 7, 2015 5:05 AM, Michail Toutoudakis mix...@gmail.com wrote:

 What is the best spout implementation for reading input data from file? I
 have implemented a spout for reading input data from file using a scanner
 which seems to perform better than buffered file reader.
 However i still loose some values, not many this time about 1%, but the
 problem is that after a few minutes of run i get java out of memory
 exception and i believe it has to do with values buffering.
 My spout implementation is:

 package tuc.LSH.storm.spouts;

 import backtype.storm.spout.SpoutOutputCollector;
 import backtype.storm.task.TopologyContext;
 import backtype.storm.topology.IRichSpout;
 import backtype.storm.topology.OutputFieldsDeclarer;
 import backtype.storm.topology.base.BaseRichSpout;
 import backtype.storm.tuple.Fields;
 import backtype.storm.tuple.Values;
 import backtype.storm.utils.Utils;
 import tuc.LSH.conf.Consts;

 import javax.rmi.CORBA.Util;
 import java.io.*;
 import java.util.Map;
 import java.util.Scanner;

 /**
  * Created by mixtou on 15/5/15.
  */
 public class FileReaderSpout extends BaseRichSpout {
 //public class FileReaderSpout implements IRichSpout {

 private SpoutOutputCollector collector;
 private Scanner scanner;
 private boolean completed;
 private TopologyContext context;
 private int spout_idx;
 private int spout_id;
 private Map config;
 private int noOfFailedWords;
 private int noOfAckedWords;

 @Override
 public void declareOutputFields(OutputFieldsDeclarer 
 outputFieldsDeclarer) {
 outputFieldsDeclarer.declareStream(data, new Fields(streamId, 
 timestamp, value));


 }

 @Override
 public void open(Map config, TopologyContext topologyContext, 
 SpoutOutputCollector spoutOutputCollector) {
 this.context = topologyContext;
 this.spout_idx = context.getThisTaskIndex();
 this.spout_id = context.getThisTaskId();
 this.collector = spoutOutputCollector;
 this.config = config;
 this.completed = false;
 this.noOfFailedWords = 0;
 this.noOfAckedWords = 0;

 try {
 this.scanner = new Scanner(new 
 File(config.get(file_to_read()).toString()));
 System.err.println(Scanner Reading File:  + 
 config.get(file_to_read()).toString() +  Spout index:  + spout_idx);
 } catch (FileNotFoundException e) {
 e.printStackTrace();
 }

 }

 @Override
 public void nextTuple() {

 if(!completed) {
 if (scanner.hasNextLine()) {
 String[] temp = scanner.nextLine().split(,);
 //System.err.println(==  + temp[0] +  +  + 
 temp[2] +  +  + temp[3]); //0-id,2-timestamp,3-value
 collector.emit(data, new Values(temp[0], temp[2], 
 temp[3]), temp[0]); //emmit the correct data to next bolt without guarantee 
 delivery
 Utils.sleep(1);
 } else {
 System.err.println(End of File Closing Reader);
 scanner.close();
 completed = true;
 }
 }

 }

 private String file_to_read() {
 //this.spout_id = context.getThisTaskId();
 if (Consts.NO_OF_SPOUTS  1) {
 int file_no = spout_idx % Consts.NO_OF_SPOUTS;
 return data + file_no;
 } else {
 return data;
 }
 }

 @Override
 public void ack(Object msgId) {
 super.ack(msgId);
 noOfAckedWords++;
 //System.out.println(OK tuple acked from bolt:  + msgId +  no of 
 acked word  + noOfAckedWords);
 }

 @Override
 public void fail(Object msgId) {
 super.fail(msgId);
 noOfFailedWords++;
 System.err.println(ERROR:  + context.getThisComponentId() +   + 
 msgId +  no of words failed  + noOfFailedWords);

 }

 }

Re: Configure scheduler for host affinity

2015-06-04 Thread Spico Florin

Hi!
 I had a same case that you have mentioned. What I have done:
1. Create a scheduler class (see the attached file)
2. On the Nimbus node, in the $STORM_HOME/conf/storm.yaml add the following
lines
storm.scheduler: NetworkScheduler
supervisor.scheduler.meta:
  name: special-supervisor
3. On the nimbus-node,  put the jar file that contains the scheduler class
in
$STORM_HOME/conf/lib
4. Restart nimbus
5. Pay attention to the code

TopologyDetails topology = topologies.getByName(network);

only the topology having the name network will be scheduled as you want.
This is case sensitive
T

I hope that these help.


On Tue, Jun 2, 2015 at 8:10 PM, B. Candler b.cand...@pobox.com wrote:

 On 02/06/2015 17:26, B. Candler wrote:

 However I couldn't find any authoritative documentation on the scheduler
 API by browsing
 https://storm.apache.org/doc-index.html
 (but I might have missed the right link)

 BTW after further digging I did find a couple of lines here:

 https://storm.apache.org/2012/08/02/storm080-released.html#pluggable-scheduler

 and that lead me to

 https://github.com/apache/storm/blob/master/storm-core/src/jvm/backtype/storm/scheduler/IScheduler.java
 (after fixing the link)

 Regards,

 Brian.




NetworkScheduler.java
Description: Binary data

Status of running storm on yarn (the yahoo project)

2015-05-27 Thread Spico Florin

Hello!
I'm interesting in running the storm topologies on yarn.
I was looking at the yahoo project https://github.com/yahoo/storm-yarn, and
I could observed that there is no activity since 7 months ago. Also, the
issues and requests lists are not updated.
Therefore I have some questions:
1. Is there any plan to evolve this project?
2. Is there any plan to integrate this project in the main branch?
3. Is someone using this approach in production ready mode?

I look forward for your answers.
 Regards,
 Florin

Streaming data from HDFS to Storm

2015-05-14 Thread Spico Florin

Hello!
   I would like to know if there is any spout implementation for streaming
data from HDFS to Storm (something similar to Spark Streaming from HDFS). I
know that there is bolt implementation to write data into HDFS (
https://github.com/ptgoetz/storm-hdfs and
http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.1.3/bk_user-guide/content/ch_storm-using-hdfs-connector.html),
but for the other way around I could not find.
  I appreciate any suggestions and hints.
Thanks.
 Florin

Using Storm with Kundera to write into Cassandra/HBase

2015-04-16 Thread Spico Florin

Hello!
   Does anyone used Kundera (
https://github.com/impetus-opensource/Kundera/wiki/Getting-Started-in-5-minutes)
to write/read data from/to Cassadra/HBase?
  Any suggestions or github example will be appreciated.
Thanks.
 Florin

Storm Production usage (case studies)

2014-12-12 Thread Spico Florin

Hello!
  I would like to know that besides the companies mentioned on the
documentation
(http://storm.apache.org/documentation/Powered-By.html), if they are any
companies that have deployed Storm on production and what were their case
study (the way that is also described in the documentation).
  I'm evaluating Storm and this information will be very helpful to know
that our case studies might fit in.
   I'll look forward for your responses.
 Thanks,
 Best regards,
  Florin

Automatic scaling workers in IaaS (EC2) cloud

2014-11-25 Thread Spico Florin

Hello!
  I would like to ask you if some of you have a similar  scenario like mine:
1. Start with a cluster of n worker nodes(virtual machines VM)
2. At some point in time, the nodes are overwhelmed due to the increasing
data for processing (aka cloud bursting)
3. You have a monitor that detects these overload and alerts another system
to react by starting new worker nodes (new VMs). Th cluster size will
shrink to n+ nodes
4. After the spike goes, the cluster should goes back to its initial state
of n nodes

If you have this kind of scenario, can you please tell me what involves in
terms of configuration of the new nodes and also in terms of starting the
new VMs (for example, start from snapshots).

I look forward for your answers and suggestions.

Thank you.

 Best regards,
 Florin

Re: Storm Error while submitting topology Failed to get local hostname java.net.UnknownHostException: xx-xxx-xxx-xxx: xxx-xxx-xx-xx

2014-10-07 Thread Spico Florin

Hello!
  I have found the issue. The wrong IP was set up in the /etc/hostname. On
the ubuntu OS, after changing it with command hostnamectl, the problem has
gone. Hope that this help others that will face the same issue.
Regards,
  Florin

On Tue, Oct 7, 2014 at 11:17 AM, Spico Florin spicoflo...@gmail.com wrote:

 Hello!
   My question is what triggers that IP to be encoded with - char instead
 of .? On the client side I'm getting the mentioned  in the StormSubmitter
 class:
  java.net.UnknownHostException: xx-xxx-xxx-xxx at
 backtype.storm.StormSubmitter.clinit(StormSubmitter.java:48)
 at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
 at  at
 java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
 at  at
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
 at  at java.net.InetAddress.getLocalHost(InetAddress.java:1469)

 and in the worker in the DisruptorQueue.java:128
 Caused by: java.net.UnknownHostException: xxx-xxx-xxx-xxx: xxx-xxx-xxx-xxx
 Name or service not known
 at java.net.InetAddress.getLocalHost(InetAddress.java:1473)
 ~[na:1.7.0_65]
 at
 backtype.storm.daemon.executor$metrics_tick.invoke(executor.clj:280)
 ~[storm-core-0.9.2-incubating.jar:0.9.2-incubating]
 at
 backtype.storm.daemon.executor$fn__5641$tuple_action_fn__5643.invoke(executor.

 I look forward for your suggestions.
  Regards,
  Florin

 On Tue, Oct 7, 2014 at 10:55 AM, Spico Florin spicoflo...@gmail.com
 wrote:

 Hello!
I'm encountering the following strange cases: While submitting the
 topology on the storm cluster I'm getting the error
 *Failed to get local hostname java.net.UnknownHostException:
 xx-xxx-xxx-xxx: xxx-xxx-xx-xx*
 *where the ** xxx-xxx-xx-xx is the IP of the nimbus (encoded here for
 security reasons. The IP is a valid one and is properly configured with .
 as separator in the storm.yaml file )*
 The error is omehow overpass on the client submitter and the topology is
 submitted to the storm cluster and deployed. But I'm getting the same error
 on the worker machines where the spout and bolts are running. The topology
 runs for a while and then is restarted after is encountering the same error
 with the corresponding IP of the workers.

 Can  please help me to understand what triggers this peculiar behavior
 and how to proprely solve it?
 I look forward for your answers.
 Thanks.
   Regards,
  Florin

55 matches

Mail list logo