might be one potential cause, you'd better
increase the vm resource to try again, just to verify your assumption.
On Fri, Dec 25, 2015 at 4:28 PM, donhoff_h <165612...@qq.com> wrote:
> Hi, Saisai Shao
>
> Many thanks for your reply. I used spark v1.3. Unfortunately I can not
> chang
Yes, basically from the currently implementation it should be.
On Mon, Dec 21, 2015 at 6:39 PM, Arun Patel <arunp.bigd...@gmail.com> wrote:
> So, Does that mean only one RDD is created by all receivers?
>
>
>
> On Sun, Dec 20, 2015 at 10:23 PM, Saisai Shao <sai.sai
Hi Siva,
How did you know that --executor-cores is ignored and where did you see
that only 1 Vcore is allocated?
Thanks
Saisai
On Tue, Dec 22, 2015 at 9:08 AM, Siva wrote:
> Hi Everyone,
>
> Observing a strange problem while submitting spark streaming job in
>
on web UI.
>
> Thanks,
> Sivakumar Bhavanari.
>
> On Mon, Dec 21, 2015 at 5:21 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> Hi Siva,
>>
>> How did you know that --executor-cores is ignored and where did you see
>> that only 1 Vcore is alloc
Normally there will be one RDD in each batch.
You could refer to the implementation of DStream#getOrCompute.
On Mon, Dec 21, 2015 at 11:04 AM, Arun Patel
wrote:
> It may be simple question...But, I am struggling to understand this
>
> DStream is a sequence of RDDs
Please check the Yarn AM log to see why AM is failed to start. That's the
reason why using `sc` will get such complaint.
On Fri, Dec 18, 2015 at 4:25 AM, Eran Witkon wrote:
> Hi,
> I am trying to install spark 1.5.2 on Apache hadoop 2.6 and Hive and yarn
>
> spark-env.sh
>
I think this is the right JIRA to fix this issue (
https://issues.apache.org/jira/browse/SPARK-7111). It should be in Spark
1.4.
On Thu, Dec 10, 2015 at 12:32 AM, Cody Koeninger wrote:
> Looks like probably
>
> https://issues.apache.org/jira/browse/SPARK-8701
>
> so 1.5.0
>
Please make sure the spark shell script you're running is pointed to
/bin/spark-shell
Just follow the instructions to correctly configure your spark 1.4.1 and
execute correct script are enough.
On Wed, Dec 9, 2015 at 11:28 AM, Divya Gehlot
wrote:
> Hi,
> As per
oadcast 0 from
>>> broadcast at DAGScheduler.scala:861
>>>
>>> 15/11/24 16:16:30 INFO scheduler.DAGScheduler: Submitting 200 missing tasks
>>> from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:32)
>>>
>>> 15/11/24 16:16:30 INFO cl
quot; equals 50, but in
> http://spark.apache.org/docs/latest/configuration.html#dynamic-allocation
> it says that
>
> "spark.dynamicAllocation.initialExecutors" equals "
> spark.dynamicAllocation.minExecutors". So, I think something was wrong,
> did it?
>
> Tha
Hi Tingwen,
Would you minding sharing your changes in
ExecutorAllocationManager#addExecutors().
>From my understanding and test, dynamic allocation can be worked when you
set the min to max number of executors to the same number.
Please check your Spark and Yarn log to make sure the executors
at the Spark application did
> not requested resource from it.
>
> Is this a bug? Should I create a JIRA for this problem?
>
> 2015-11-24 12:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>
>> OK, so this looks like your Yarn cluster does not allocate containers
>>
r: Not adding executors
> because our current target total is already 50 (limit 50)".
> Thanks
> Weber
>
> 2015-11-23 21:00 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>:
>
>> Hi Tingwen,
>>
>> Would you minding sharing your changes in
>> ExecutorAllocatio
Node label for AM is not yet supported for Spark now, currently only
executor is supported.
On Tue, Nov 17, 2015 at 7:57 AM, Ted Yu wrote:
> Wangda, YARN committer, told me that support for selecting which nodes the
> application master is running on is integrated to the
it should worked. I tested in my local environment with "curl
http://localhost:4040/metrics/json/;, there's metrics dumped. For cluster
metrics, you have to change your base url to point to cluster manager.
Thanks
Jerry
On Mon, Nov 16, 2015 at 5:42 PM, ihavethepotential <
I think for receiver-less Streaming connectors like direct Kafka input
stream or hdfs connector, dynamic allocation could be worked compared to
other receiver-based streaming connectors, since for receiver-less
connectors, the behavior of streaming app is more like a normal Spark app,
so dynamic
e micro-batch jobs are likely to use all the
> executors all the time, and no executor will remain idle for long. That is
> why the heuristic doesnt work that well.
>
>
> On Wed, Nov 11, 2015 at 6:32 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> I think for r
Hi Darren,
Functionality like messageHandler is missing in python API, still not
included in version 1.5.1.
Thanks
Jerry
On Wed, Nov 11, 2015 at 7:37 AM, Darren Govoni wrote:
> Hi,
> I read on this page
>
>From my understanding, it depends on whether you enabled CGroup isolation
or not in Yarn. By default it is not, which means you could allocate one
core but bump a lot of thread in your task to occupy the CPU resource, this
is just a logic limitation. For Yarn CPU isolation you may refer to this
Hi Swetha,
Would you mind elaborating your usage scenario of DStream unpersisting?
>From my understanding:
1. Spark Streaming will automatically unpersist outdated data (you already
mentioned about the configurations).
2. If streaming job is started, I think you may lose the control of the
job,
I think it can be worked unless you use some new APIs that only exists in
1.5.1 release (mostly this will not happened). You'd better take a try to
see if it can be run or not.
On Tue, Nov 3, 2015 at 10:11 AM, pnpritchard <
nicholas.pritch...@falkonry.com> wrote:
> The title gives the gist of
What Spark version are you using, also a small code snippet of how you use
Spark Streaming would be greatly helpful.
On Fri, Oct 30, 2015 at 3:57 PM, Ramkumar V wrote:
> I can able to read and print few lines. Afterthat i'm getting this
> exception. Any idea for this ?
Set
>);
>
> messages.foreachRDD(new Function<JavaPairRDD,Void> () {
> public Void call(JavaPairRDD tuple) {
> JavaRDDrdd = tuple.values();
> rdd.saveAsTextFile("hdfs://myuser:8020/user/hdfs/output");
> re
unstable.
On Fri, Oct 30, 2015 at 5:13 PM, Ramkumar V <ramkumar.c...@gmail.com> wrote:
> No, i dont have any special settings. if i keep only reading line in my
> code, it's throwing NPE.
>
> *Thanks*,
> <https://in.linkedin.com/in/ramkumarcs31>
>
>
> On Fri,
ing.api.java.JavaStreamingContext.start(JavaStreamingContext.scala:622)
>
>
> *Thanks*,
> <https://in.linkedin.com/in/ramkumarcs31>
>
>
> On Fri, Oct 30, 2015 at 3:25 PM, Saisai Shao <sai.sai.s...@gmail.com
> <javascript:_e(%7B%7D,'cvml','sai.sai.s...@gmail.com');>> w
ks*,
> <https://in.linkedin.com/in/ramkumarcs31>
>
>
> On Fri, Oct 30, 2015 at 2:50 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> I just did a local test with your code, seems everything is fine, the
>> only difference is that I use the master branch, but I don'
Looks like currently there's no way for Spark Streaming to restart
automatically in yarn-client mode, because in yarn-client mode, AM and
driver are two processes, Yarn only control the restart of AM, not driver,
so it is not supported in yarn-client mode.
You can write some scripts to monitor
.main(HistoryServer.scala)
>
>
> I went to the lib folder and noticed that
> "spark-assembly-1.5.1-hadoop2.6.0.jar" is missing that class. I was able to
> get the spark history server started with 1.3.1 but not 1.5.1. Any inputs
> on this?
>
> Really appreciate your help
of time, do I have to manually go and copy
> spark-1.5.1 tarbal to all the nodes or is there any alternative so that I
> can get it upgraded through Ambari UI ? If possible can anyone point me to
> a documentation online? Thank you.
>
> Regards,
> Ajay
>
>
> On Wednesday, October
http://www.meruvian.org
>
> "We grow because we share the same belief."
>
>
> On Thu, Oct 22, 2015 at 8:56 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
> > How you start history server, do you still use the history server of
> 1.3.1,
> > or y
Hi Frans,
You could download Spark 1.5.1-hadoop 2.6 pre-built tarball and copy into
HDP 2.3 sandbox or master node. Then copy all the conf files from
/usr/hdp/current/spark-client/ to your /conf, or you could
refer to this tech preview (
You could check the code of KafkaRDD, the locality (host) is got from
Kafka's partition and set in KafkaRDD, this will a hint for Spark to
schedule task on the preferred location.
override def getPreferredLocations(thePart: Partition): Seq[String] = {
val part =
This preferred locality is a hint to spark to schedule Kafka tasks on the
preferred nodes, if Kafka and Spark are two separate cluster, obviously
this locality hint takes no effect, and spark will schedule tasks following
node-local -> rack-local -> any pattern, like any other spark tasks.
On
maybe you could try "localCheckpoint" insteadly.
2015年10月14日星期三,张仪yf1 <zhangyi...@hikvision.com> 写道:
> Thank you for your reply. It helped a lot. But when the data became
> bigger, the action cost more, is there any optimizer
>
>
>
> *发件人:* Saisai S
By configuring "spark.ui.port" to the port you could bind.
On Tue, Oct 13, 2015 at 8:47 PM, Langston, Jim
wrote:
> Hi all,
>
> Is there anyway to change the default port 4040 for the localhost webUI,
> unfortunately, that port is blocked and I have no control of
You have to call the checkpoint regularly on rdd0 to cut the dependency
chain, otherwise you will meet such problem as you mentioned, even stack
overflow finally. This is a classic problem for high iterative job, you
could google it for the fix solution.
On Tue, Oct 13, 2015 at 7:09 PM, 张仪yf1
As I remembered you don't need to upload application jar manually, Spark
will do it for you when you use Spark submit. Would you mind posting out
your command of Spark submit?
On Wed, Sep 30, 2015 at 3:13 PM, Christophe Schmitz
wrote:
> Hi there,
>
> I am trying to use the
gt;
> Christophe
>
>
> On Wed, Sep 30, 2015 at 5:19 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> As I remembered you don't need to upload application jar manually, Spark
>> will do it for you when you use Spark submit. Would you mind posting out
>> yo
k-streaming-kafka-assembly_2.10-1.5.0'
> >>>
>
>
> So I launched pyspark with --jars with the assembly jar. Now it is
> working.
>
> THANK YOU for help.
>
> Curiosity: Why adding it to SPARK CLASSPATH did not work?
>
> Best
> Ayan
>
> On Wed, Se
I think you're using the wrong version of kafka assembly jar, I think
Python API from direct Kafka stream is not supported for Spark 1.3.0, you'd
better change to version 1.5.0, looks like you're using Spark 1.5.0, why
you choose Kafka assembly 1.3.0?
I think you need to increase the memory size of executor through command
arguments "--executor-memory", or configuration "spark.executor.memory".
Also yarn.scheduler.maximum-allocation-mb in Yarn side if necessary.
Thanks
Saisai
On Mon, Sep 21, 2015 at 5:13 PM, Alexander Pivovarov
Hi Swetha,
The problem of stack overflow is that when recovering from checkpoint data,
Java will use a recursive way to deserialize the call stack, if you have a
large call stack, this recursive way can easily lead to stack overflow.
This is caused by Java deserialization mechanism, you need to
Is your "KafkaGenericEvent" serializable? Since you call rdd.collect() to
fetch the data to local driver, so this KafkaGenericEvent need to be
serialized and deserialized through Java or Kryo (depends on your
configuration) serializer, not sure if it is your problem to always get a
default object.
Task set is a set of tasks within one stage.
Executor will be killed when it is idle for a period of time (default is
60s). The problem you mentioned is bug, scheduler should not allocate tasks
on this to-be killed executors. I think it is fixed in 1.5.
Thanks
Saisai
On Thu, Sep 17, 2015 at
Hi Qianhao,
I think you could sort the data by yourself if you want achieve the same
result as MR, like rdd.reduceByKey(...).mapPartitions(// sort within each
partition). Do not call sortByKey again since it will introduce another
shuffle (that's the reason why it is slower than MR).
The
Yes not the offset ranges, but the real data will be shuffled when you
using repartition().
Thanks
Saisai
On Fri, Sep 4, 2015 at 12:42 PM, Shushant Arora
wrote:
> 1.Does repartitioning on direct kafka stream shuffles only the offsets or
> exact kafka messages across
Here is the code in which NewHadoopRDD register close handler and be called
when the task is completed (
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L136
).
>From my understanding, possibly the reason is that this `foreach` code in
your
Here is the Rest related part in Spark (
https://github.com/apache/spark/tree/master/core/src/main/scala/org/apache/spark/deploy/rest
), current I don't think there's a document address this part, also this
rest api is only used for SparkSubmit currently, not public API as I know.
Thanks
Jerry
Hi,
Lots of streaming internal status are exposed through StreamingListener, as
well as what see from web UI, so you could write your own StreamingListener
and register in StreamingContext to get the internal information of Spark
Streaming and write to CSV file.
You could check the source code
Hi,
What Spark version do you use? it looks like a problem of configuration
recovery, not sure is it a twitter streaming specific problem, I tried
Kafka streaming with checkpoint enabled in my local machine, seems no such
issue. Did you try to set these configurations in somewhere?
Thanks
Saisai
Hi Muler,
Shuffle data will be written to disk, no matter how large memory you have,
large memory could alleviate shuffle spill where temporary file will be
generated if memory is not enough.
Yes, each node writes shuffle data to file and pulled from disk in reduce
stage from network framework
) then shuffle (in-memory shuffle) spill doesn't happen
(per node) but still shuffle data has to be ultimately written to disk so
that reduce stage pulls if across network?
On Wed, Aug 5, 2015 at 4:40 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Hi Muler,
Shuffle data will be written to disk
I think you have to using 604800 instead of 7 * 24 * 3600, obviously
SparkConf will not do multiplication for you..
The exception is quite obvious: Caused by: java.lang.NumberFormatException:
For input string: 3 * 24 * 3600
2015-06-16 14:52 GMT+08:00 luohui20...@sina.com:
Hi guys:
I
Using sc.textFile will also read the file from HDFS one by one line through
iterator, don't need to fit all into memory, even you have small size of
memory, it still can be worked.
2015-06-12 13:19 GMT+08:00 SLiZn Liu sliznmail...@gmail.com:
Hmm, you have a good point. So should I load the file
a...@yelp.com:
Thanks, Jerry. That's what I suspected based on the code I looked at. Any
pointers on what is needed to build in this support would be great. This is
critical to the project we are currently working on.
Thanks!
On Thu, Jun 11, 2015 at 10:54 PM, Saisai Shao sai.sai.s
.
Spark does not support any state persistence across deployments so this is
something we need to handle on our own.
Hope that helps. Let me know if not.
Thanks!
Amit
On Thu, Jun 11, 2015 at 10:02 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Hi,
What is your meaning of getting
Hi,
What is your meaning of getting the offsets from the RDD, from my
understanding, the offsetRange is a parameter you offered to KafkaRDD, why
do you still want to get the one previous you set into?
Thanks
Jerry
2015-06-12 12:36 GMT+08:00 Amit Ramesh a...@yelp.com:
Congratulations on the
I think you could check the yarn nodemanager log or other Spark executor
logs to see the details. What you listed above of the exception stacks are
just the phenomenon, not the cause. Normally there will be some situations
which will lead to executor lost:
1. Killed by yarn cause of memory
It depends on how you use Spark, if you use Spark with Yarn and enable
dynamic allocation, the number of executor is not fixed, will change
dynamically according to the load.
Thanks
Jerry
2015-05-27 14:44 GMT+08:00 canan chen ccn...@gmail.com:
It seems the executor number is fixed for the
works ? I mean does it related
with parallelism of my RDD and how does driver know how many executor it
needs ?
On Wed, May 27, 2015 at 2:49 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
It depends on how you use Spark, if you use Spark with Yarn and enable
dynamic allocation, the number
a way to do
it in Yarn.
On Wednesday, May 27, 2015, Saisai Shao sai.sai.s...@gmail.com wrote:
The drive has a heuristic mechanism to decide the number of executors in
the run-time according the pending tasks. You could enable with
configuration, you could refer to spark document to find
I think here is the PR https://github.com/apache/spark/pull/2994 you could
refer to.
2015-05-20 13:41 GMT+08:00 twinkle sachdeva twinkle.sachd...@gmail.com:
Hi,
As Spark streaming is being nicely integrated with consuming messages from
Kafka, so I thought of asking the forum, that is there
HI Bill,
You don't need to match the number of thread to the number of partitions in
the specific topic, for example, you have 3 partitions in topic1, but you
only set 2 threads, ideally 1 thread will receive 2 partitions and another
thread for the left one partition, it depends on the scheduling
Also Kafka has a Hadoop consumer API for doing such things, please refer to
http://kafka.apache.org/081/documentation.html#kafkahadoopconsumerapi
2015-05-06 12:22 GMT+08:00 MrAsanjar . afsan...@gmail.com:
why not try https://github.com/linkedin/camus - camus is kafka to HDFS
pipeline
On
to make a *skew* data/executor
distribution?
Best,
Yifan LI
On 06 May 2015, at 15:13, Saisai Shao sai.sai.s...@gmail.com wrote:
I think it depends on your workload and executor distribution, if your
workload is evenly distributed without any big data skew, and executors are
evenly
I think you could configure multiple disks through spark.local.dir, default
is /tmp. Anyway if your intermediate data is larger than available disk
space, still will meet this issue.
spark.local.dir/tmpDirectory to use for scratch space in Spark, including
map output files and RDDs that get
mentioned.
2015-05-06 21:09 GMT+08:00 Yifan LI iamyifa...@gmail.com:
Thanks, Shao. :-)
I am wondering if the spark will rebalance the storage overhead in
runtime…since still there is some available space on other nodes.
Best,
Yifan LI
On 06 May 2015, at 14:57, Saisai Shao sai.sai.s
IMHO If your data or your algorithm is prone to data skew, I think you have
to fix this from application level, Spark itself cannot overcome this
problem (if one key has large amount of values), you may change your
algorithm to choose another shuffle key, somethings like this to avoid
shuffle on
talk more about shuffle key or point me to APIs that allow me to
change shuffle key. I will try with different keys and see the performance.
What is the shuffle key by default ?
On Mon, May 4, 2015 at 2:37 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
IMHO If your data or your algorithm
, Saisai Shao sai.sai.s...@gmail.com
wrote:
Shuffle key is depending on your implementation, I'm not sure if you
are familiar with MapReduce, the mapper output is a key-value pair, where
the key is the shuffle key for shuffling, Spark is also the same.
2015-05-04 17:31 GMT+08:00 ÐΞ€ρ@Ҝ
Here is the pull request, you may refer to this:
https://github.com/apache/spark/pull/2994
Thanks
Jerry
2015-05-01 14:38 GMT+08:00 Pavan Sudheendra pavan0...@gmail.com:
Link to the question:
http://stackoverflow.com/questions/29974017/spark-kafka-producer-not-serializable-exception
From the chart you pasted, I guess you only have one receiver with storage
level two copies, so mostly your taks are located on two executors. You
could use repartition to redistribute the data more evenly across the
executors. Also add more receiver is another solution.
2015-04-30 14:38
, Apr 29, 2015 at 8:33 AM, bit1...@163.com bit1...@163.com wrote:
For the SparkContext#textFile, if a directory is given as the path
parameter ,then it will pick up the files in the directory, so the same
thing will occur.
--
bit1...@163.com
*From:* Saisai Shao
I think currently there's no API in Spark Streaming you can use to get the
file names for file input streams. Actually it is not trivial to support
this, may be you could file a JIRA with wishes you want the community to
support, so anyone who is interested can take a crack on this.
Thanks
Jerry
in RDD
to figure out the file information where the data in RDD is from
--
bit1...@163.com
*From:* Saisai Shao sai.sai.s...@gmail.com
*Date:* 2015-04-29 10:10
*To:* lokeshkumar lok...@dataken.net
*CC:* spark users user@spark.apache.org
*Subject:* Re: Spark
Hi Marco,
As I know, current combineByKey() does not expose the related argument
where you could set keyOrdering on the ShuffledRDD, since ShuffledRDD is
package private, if you can get the ShuffledRDD through reflection or other
way, the keyOrdering you set will be pushed down to shuffle. If you
Would you please share your code snippet please, so we can identify is
there anything wrong in your code.
Beside would you please grep your driver's debug log to see if there's any
debug log about Stream xxx received block xxx, this means that Spark
Streaming is keeping receiving data from
Also you could use Producer singletion to improve the performance, since
now you have to create a Producer for each partition in each batch
duration, you could create a singleton object and reuse it (Producer is
tread safe as I know).
-Jerry
2015-03-30 15:13 GMT+08:00 Saisai Shao sai.sai.s
()
}
}
Thanksamp;Best regards!
罗辉 San.Luo
- 原始邮件 -
发件人:Saisai Shao sai.sai.s...@gmail.com
收件人:luohui20...@sina.com
抄送人:user user@spark.apache.org
主题:Re: How SparkStreaming output messages to Kafka?
日期:2015年03月30日 14点03分
Hi Hui,
Did you try the direct Kafka
?
Thanksamp;Best regards!
罗辉 San.Luo
- 原始邮件 -
发件人:luohui20...@sina.com
收件人:Saisai Shao sai.sai.s...@gmail.com
抄送人:user user@spark.apache.org
主题:回复:Re: Re: How SparkStreaming output messages to Kafka?
日期:2015年03月30日 16点46分
Hi Saisai,
following your advice, i
Shuffle write will finally spill the data into file system as a bunch of
files. If you want to avoid disk write, you can mount a ramdisk and
configure spark.local.dir to this ram disk. So shuffle output will write
to memory based FS, and will not introduce disk IO.
Thanks
Jerry
2015-03-30 17:15
Hi,
I think for local mode, the number N (N number of thread) basically equals
to N number of available cores in ONE executor(worker), not N workers. You
could image local[N] as have one worker with N cores. I'm not sure you
could set the memory usage for each thread, for Spark the memory is
Hi,
Did you run the word count example in Spark local mode or other mode, in
local mode you have to set Local[n], where n =2. For other mode, make sure
available cores larger than 1. Because the receiver inside Spark Streaming
wraps as a long-running task, which will at least occupy one core.
Looks like you have to build Spark with related Hadoop version, otherwise
you will meet exception as mentioned. you could follow this doc:
http://spark.apache.org/docs/latest/building-spark.html
2015-03-25 15:22 GMT+08:00 sandeep vura sandeepv...@gmail.com:
Hi Sparkers,
I am trying to load
Would you mind running again to see if this exception can be reproduced
again, since exception in MapOutputTracker seldom occurs, maybe some other
exceptions which lead to this error.
Thanks
Jerry
2015-03-26 10:55 GMT+08:00 李铖 lidali...@gmail.com:
One more exception.How to fix it .Anybody help
Probably the cleanup work like clean shuffle files, tmp files cost too much
of CPUs, since if we run Spark Streaming for a long time, lots of files
will be generated, so cleanup this files before app is exited could be
time-consuming.
Thanks
Jerry
2015-03-11 10:43 GMT+08:00 Tathagata Das
Would you mind explaining your problem a little more specifically, like
exceptions you met or others, so someone who has experiences on it could
give advice.
Thanks
Jerry
2015-02-19 1:08 GMT+08:00 athing goingon athinggoin...@gmail.com:
hi, I have a job that fails on a shuffle during a
Hi Tim,
I think this code will still introduce shuffle even when you call
repartition on each input stream. Actually this style of implementation
will generate more jobs (job per each input stream) than union into one
stream as called DStream.union(), and union normally has no special
overhead as
partitions (some will probably sit idle).
But do away with dStream partitioning altogether.
Right?
Thanks,
- Tim
On Thu, Feb 12, 2015 at 11:03 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Hi Tim,
I think maybe you can try this way:
create Receiver per executor and specify thread
at 9:45 PM, Saisai Shao sai.sai.s...@gmail.com
wrote:
Hi Tim,
I think this code will still introduce shuffle even when you call
repartition on each input stream. Actually this style of implementation
will generate more jobs (job per each input stream) than union into one
stream as called
101 - 190 of 190 matches
Mail list logo