of operations, then there will be a lot of shuffle data. So You need
to check in the worker logs and see what happened (whether DISK full etc.),
We have streaming pipelines running for weeks without having any issues.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons.
In my understanding, it is not guaranteed that Spark streaming receiver
will
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons
spawn another receiver on another machine or on the same machine.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang yangjun...@gmail.com wrote:
Dibyendu,
Thanks for the reply.
I am reading your project homepage now.
One quick question I care about is:
If the receivers
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.
However, it looks that
Guys,
As to the questions of pre-processing, you could just migrate your logic to
Spark before using K-means.
I only used Scala on Spark, and haven't used Python binding on Spark, but I
think the basic steps must be the same.
BTW, if your data set is big with huge sparse dimension feature
Guys,
Recently we are migrating our backend pipeline from to Spark.
In our pipeline, we have a MPI-based HAC implementation, to ensure the
result consistency of migration, we also want to migrate this
MPI-implemented code to Spark.
However, during the migration process, I found that there are