Storm is not your bottleneck. Check your Storm code to 1) ensure you're
parallelizing your writes and 2) you're batching writes to your external
resources if possible. Some quick napkin math shows you only doing 110
writes/s, which seems awfully low.
Michael Rose (@Xorlev https://twitter.com
Hi Kushan,
Depending on the Kafka spout you're using, it could be doing different
things when it failed. However, if it's running reliably, the Cassandra
insertion failures would have forced a replay from the spout until they had
completed.
Michael Rose (@Xorlev https://twitter.com/xorlev
, Michael Rose mich...@fullcontact.com
wrote:
It's another case of a streaming join. I've done this before, there
aren't too many gotchas, other than you need a datastructure which purges
stale unresolved joins beyond the tuple timeout time (I used a Guava cache
for this).
Michael Rose (@Xorlev
Run your producer code in another thread to fill a LBQ, poll that with
nextTuple instead.
You should never be blocking yourself inside a spout.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Fri
://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/backtype/storm/spout/NothingEmptyEmitStrategy.java
if you do see an impact on your throughput--but I've never needed this.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com
Worth clarifying for anyone else in this thread that a LBQ separating
production from consumption is not a default thing in Storm, it's something
we cooked up to prefetch elements from slow/batching resources.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact
', then having bolt b0
round-robin the received tuples between two streams, then have b1 and b2
shuffle over those streams instead.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Wed, Jul 16, 2014 at 5:40
will receive an executor but the others will not.
It sounds like for your case, shuffle+parallelism is more than sufficient.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Wed, Jul 16, 2014 at 5:53 PM
You need only run the existing releases on JDK 7 or 8.
On Jul 14, 2014 7:15 AM, Haralds Ulmanis hara...@evilezh.net wrote:
Actually now I've customized a bit storm and recompiled as I needed some
changes in it.
But initially I just downloaded and run.
On 14 July 2014 14:02, Adrianos
before, the existing schedulers
are in Clojure. It's not impossible to do for sure, but like Andrew said it
might well just be easier to have separate clusters that share ZK clusters.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com
In a single worker, you don't incur serialization or network overhead.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Tue, Jun 17, 2014 at 11:09 PM, Romain Leroux leroux@gmail.com
wrote:
Still
What kind of issues does Metrics have that leads you to recommend
HdrHistogram?
On Jun 16, 2014 6:57 PM, Dan dcies...@hotmail.com wrote:
Be careful when using Coda Hale's Metrics package when measuring latency.
Consider using Gil Tene's
High Dynamic Range Histogram instead:
A topology can run in as many workers as you assign at launch time, DRPC or
not.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Wed, Jun 11, 2014 at 1:08 PM, Nima Movafaghrad
nima.movafagh
Out of curiosity, what kind of changes have you been making?
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Tue, Jun 10, 2014 at 9:36 PM, Jason Jackson jasonj...@gmail.com wrote:
Hi Alex,
We're
You can have a loop on a different stream. It's not always the best thing
to do (deadlock possibilities from buffers) but we have a production
topology that has that kind of pattern. In our case, one bolt acts as a
coordinator for recursive search.
Michael Rose (@Xorlev https://twitter.com/xorlev
for
init code (e.g. properties / Guice). Check out BaseTaskHook, it's easily
extendible and can be included pretty easily too:
stormConfig.put(Config.TOPOLOGY_AUTO_TASK_HOOKS,
Lists.newArrayList(MyTaskHook.class.getName()));
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform
) {
// do stuff
initialized = true;
}
}
}
Until there's a set of lifecycle hooks, that's about as good as I've cared
to make it.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
We use upstart. Supervisord would also work. Just anything to keep an eye
on it and restart it if it dies (a very rare occurrence).
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Wed, May 21, 2014
No reason why you couldn't do it, but as far as I know it hasn't been done
before. You can send any kind of serializable data into a topology. You'd
probably need to emit frames from the spout.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http
).
For development, there shouldn't be an issues foregoing supervision.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Fri, May 2, 2014 at 12:41 PM, P. Taylor Goetz ptgo...@gmail.com wrote:
I don’t think
In AWS, we're fans of c1.xlarges, m3.xlarges, and c3.2xlarges, but have
seen Storm successfully run on cheaper hardware.
Our Nimbus server is usually bored on a m1.large.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich
We don't use /etc/hosts mapping, we only use hostnames / ips.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Tue, Apr 29, 2014 at 8:29 AM, Derek Dagit der...@yahoo-inc.com wrote:
I have not tried
of tuples.
You could really abuse Storm if you wanted and use it as a distributed
application container with threadpools, I've done it. But you're really
going to see a better experience out of a webservice if it's live-mode
requests.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior
Yes. Ultimately, that runs the main method of MyTopology, so just like any
other main method you get String[] args.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Thu, Apr 17, 2014 at 5:36 PM
messageId is any unique identifier of the message, such that when ack is
called on your spout you're returned the identifier to then mark the work
as complete in the source in the case it supports replay.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http
(and meets your criteria). Storm will handle rate limiting the
spouts with sleeps.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Tue, Mar 18, 2014 at 5:14 PM, David Crossland da...@elastacloud.comwrote
Ooyala did, customized
for our specific needs, it's an excellent pattern.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Mon, Mar 17, 2014 at 6:21 PM, Chris Bedford ch...@buildlackey.comwrote:
Hi
What kind of comparisons are you looking for? How they functionally work?
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio roberto.coluc...@gmail.com
+1, localOrShuffle will be a winner, as long as it's evenly distributing
work. If 1 tuple could say produce a variable 1-100 resultant tuples (and
these results were expensive enough to process, e.g. IO), it might well be
worth shuffling vs. localShuffling.
Michael Rose (@Xorlev https
as storm, cd storm/daemon, supervise .) and seeing what
kind of errors you see.
Are your disks perhaps filled?
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Mon, Mar 3, 2014 at 6:49 PM, Otis
I'd recommend just using one Zookeeper instance if they're on the same
physical host. There's no reason why a development ZK ensemble needs 3
nodes.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Sun
The fact that the process is being killed constantly is a red flag. Also,
why are you running it as a client VM?
Check your nimbus.log to see why it's restarting.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich
. In
either case, t's pushing the management, verification, and reestablishment
of broken connections into the pool (which is also why we have 1 extra conn
-- for when a conn is tied up running a validation query or is being
reestablished).
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform
instances.
supervisor.worker.start.timeout.secs 120
supervisor.worker.timeout.secs 60
I'd try tuning your worker start timeout here. Try setting it up to 300s
and (again) ensuring your prepare method only initializes expensive
resources once, then shares them between instances in the JVM.
Michael Rose
We've done this with SLF4j and Guava as well without issues.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Thu, Feb 6, 2014 at 3:03 PM, Mark Greene m...@evertrue.com wrote:
We had this problem
What version of ZeroMQ are you running?
You should be running 2.1.7 with nathan's provided fork of JZMQ.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Fri, Jan 10, 2014 at 9:09 PM, Gaurav Sehgal
to distribute load' -- the ShuffleGrouping will
partition work across tasks on an even basis.
3) Yes, in storm.yaml, supervisor.slots.ports. By default it'll run with 4
slots per machine. See
https://github.com/nathanmarz/storm/blob/master/conf/defaults.yaml#L77
Michael Rose (@Xorlev https
Generally speaking, I don't know of many services that work exceedingly
well over a WAN.
Can you not do processing at each location and forward it on with a queue
that isn't adverse to WAN links?
On Dec 29, 2013 10:03 AM, Derrick Karimi derrick.kar...@gmail.com wrote:
Hello,
I have a
missing somewhere along the line, fail() will be called
after a timeout. If you kill the acker that tuple was tracked with, it's
then up to the message queue or other impl to be able to replay that
message.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact
Make a base spring bolt, in your prepare method inject the members. That's
the best I've come up with, as prepare happens server side whereas topology
config and static initializers happen at deploy time client side.
On Dec 25, 2013 7:51 AM, Michal Singer mic...@leadspace.com wrote:
Hi, I am
will initialize the spring context.
This way, the bolts will call other spring beans which are not bolts and
initialized in spring. But of course this is a very limited solution.
*From:* Michael Rose [mailto:mich...@fullcontact.com]
*Sent:* Wednesday, December 25, 2013 5:06 PM
*To:* user
The statistics are sampled, but in general should be +/- 20 tuples of where
they should be.
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Mon, Dec 23, 2013 at 9:53 PM, churly lin chury...@gmail.com
https://gist.github.com/Xorlev/8058947
This is the...gist...of it. :) Hope this helps!
Michael Rose (@Xorlev https://twitter.com/xorlev)
Senior Platform Engineer, FullContact http://www.fullcontact.com/
mich...@fullcontact.com
On Fri, Dec 20, 2013 at 10:36 AM, Pete Carlson p
43 matches
Mail list logo