Hye Rachana, could you provide the full jstack outputs? Maybe it's same as
https://issues.apache.org/jira/browse/SPARK-11104
Best Regards,
Shixiong Zhu
2016-01-04 12:56 GMT-08:00 Rachana Srivastava <
rachana.srivast...@markmonitor.com>:
> Hello All,
>
>
>
> I am running my
Just replace `localhost` with a host name that can be accessed by Yarn
containers.
Best Regards,
Shixiong Zhu
2015-12-22 0:11 GMT-08:00 prasadreddy <alle.re...@gmail.com>:
> How do we achieve this on yarn-cluster mode
>
> Please advice.
>
> Thanks
> Prasad
>
&
Looks you need to add an "driver" option to your codes, such as
sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:@:1521:xxx",
"driver" -> "oracle.jdbc.driver.OracleDriver",
"dbtable&q
Looks you have a reference to some Akka class. Could you post your codes?
Best Regards,
Shixiong Zhu
2015-12-17 23:43 GMT-08:00 Pankaj Narang <pankajnaran...@gmail.com>:
> I am encountering below error. Can somebody guide ?
>
> Something similar is one this link
> https://
You are right. "checkpointInterval" is only for data checkpointing.
"metadata checkpoint" is done for each batch. Feel free to send a PR to add
the missing doc.
Best Regards,
Shixiong Zhu
2015-12-18 8:26 GMT-08:00 Lan Jiang <ljia...@gmail.com>:
> Need some clarific
Best Regards,
Shixiong Zhu
2015-12-17 4:39 GMT-08:00 Bartłomiej Alberski <albers...@gmail.com>:
> I prepared simple example helping in reproducing problem:
>
> https://github.com/alberskib/spark-streaming-broadcast-issue
>
> I think that in that way it will be easier for you
What's the Scala version of your Spark? Is it 2.10?
Best Regards,
Shixiong Zhu
2015-12-17 10:10 GMT-08:00 Christos Mantas <cman...@cslab.ece.ntua.gr>:
> Hello,
>
> I am trying to set up a simple example with Spark Streaming (Python) and
> Kafka on a single machine deployment.
It doesn't guarantee that. E.g.,
scala> sc.parallelize(Seq(1.0, 2.0, 3.0, 4.0), 2).filter(_ >
2.0).zipWithUniqueId().collect().foreach(println)
(3.0,1)
(4.0,3)
It only guarantees "unique".
Best Regards,
Shixiong Zhu
2015-12-13 10:18 GMT-08:00 Sourav Mazumder <sourav.m
Could you send a PR to fix it? Thanks!
Best Regards,
Shixiong Zhu
2015-12-08 13:31 GMT-08:00 Richard Marscher <rmarsc...@localytics.com>:
> Alright I was able to work through the problem.
>
> So the owning thread was one from the executor task launch worker, which
> at least
Which version are you using? Could you post these thread names here?
Best Regards,
Shixiong Zhu
2015-12-07 14:30 GMT-08:00 Richard Marscher <rmarsc...@localytics.com>:
> Hi,
>
> I've been running benchmarks against Spark in local mode in a long running
> process. I'm seeing th
Het Eyal, I just checked the couchbase spark connector jar. The target
version of some of classes are Java 8 (52.0). You can create a ticket in
https://issues.couchbase.com/projects/SPARKC
Best Regards,
Shixiong Zhu
2015-11-26 9:03 GMT-08:00 Ted Yu <yuzhih...@gmail.com>:
> StoreMod
In addition, if you have more than two text files, you can just put them
into a Seq and use "reduce(_ ++ _)".
Best Regards,
Shixiong Zhu
2015-11-11 10:21 GMT-08:00 Jakob Odersky <joder...@gmail.com>:
> Hey Jeff,
> Do you mean reading from multiple text files? In that c
to find similar
issues in the PR build.
Best Regards,
Shixiong Zhu
2015-11-09 18:47 GMT-08:00 Ted Yu <yuzhih...@gmail.com>:
> Created https://github.com/apache/spark/pull/9585
>
> Cheers
>
> On Mon, Nov 9, 2015 at 6:39 PM, Josh Rosen <joshro...@databricks.com>
> wrote
You should use `SparkConf.set` rather than `SparkConf.setExecutorEnv`. For
driver configurations, you need to set them before starting your
application. You can use the `--conf` argument before running
`spark-submit`.
Best Regards,
Shixiong Zhu
2015-11-04 15:55 GMT-08:00 William Li
"trackStateByKey" is about to be added in 1.6 to resolve the performance
issue of "updateStateByKey". You can take a look at
https://issues.apache.org/jira/browse/SPARK-2629 and
https://github.com/apache/spark/pull/9256
Thanks for reporting it Terry. I submitted a PR to fix it:
https://github.com/apache/spark/pull/9132
Best Regards,
Shixiong Zhu
2015-10-15 2:39 GMT+08:00 Reynold Xin <r...@databricks.com>:
> +dev list
>
> On Wed, Oct 14, 2015 at 1:07 AM, Terry Hoo <hujie.ea...@gmail.co
Scala 2.10 REPL javap doesn't support Java7 or Java8. It was fixed in Scala
2.11. See https://issues.scala-lang.org/browse/SI-4936
Best Regards,
Shixiong Zhu
2015-10-15 4:19 GMT+08:00 Robert Dodier <robert.dod...@gmail.com>:
> Hi,
>
> I am working with Spark 1.5.1 (o
Which mode are you using? For standalone, it's
org.apache.spark.deploy.worker.Worker. For Yarn and Mesos, Spark just
submits its request to them and they will schedule processes for Spark.
Best Regards,
Shixiong Zhu
2015-10-12 20:12 GMT+08:00 Muhammad Haseeb Javed <11besemja...@seecs.edu
In addition, you cannot turn off JobListener and SQLListener now...
Best Regards,
Shixiong Zhu
2015-10-13 11:59 GMT+08:00 Shixiong Zhu <zsxw...@gmail.com>:
> Is your query very complicated? Could you provide the output of `explain`
> your query that consumes an excessive amou
Could you show how did you set the configurations? You need to set these
configurations before creating SparkContext and SQLContext.
Moreover, the history sever doesn't support SQL UI. So
"spark.eventLog.enabled=true" doesn't work now.
Best Regards,
Shixiong Zhu
2015-10-13 2:01
You don't need to care about this sleep. It runs in a separate thread and
usually won't affect the performance of your application.
Best Regards,
Shixiong Zhu
2015-10-09 6:03 GMT+08:00 yael aharon <yael.aharo...@gmail.com>:
> Hello,
> I am working on improving the performance
Is your query very complicated? Could you provide the output of `explain`
your query that consumes an excessive amount of memory? If this is a small
query, there may be a bug that leaks memory in SQLListener.
Best Regards,
Shixiong Zhu
2015-10-13 11:44 GMT+08:00 Nicholas Pritchard
Each ReceiverInputDStream will create one Receiver. If you only use
one ReceiverInputDStream, there will be only one Receiver in the cluster.
But if you create multiple ReceiverInputDStreams, there will be multiple
Receivers.
Best Regards,
Shixiong Zhu
2015-10-12 23:47 GMT+08:00 Something
Could you print the content of RDD to check if there are multiple values
for a key in a batch?
Best Regards,
Shixiong Zhu
2015-10-12 18:25 GMT+08:00 Sathiskumar <sathish.palaniap...@gmail.com>:
> I'm running a Spark Streaming application for every 10 seconds, its job is
> to
> co
Do you have the full stack trace? Could you check if it's same as
https://issues.apache.org/jira/browse/SPARK-10422
Best Regards,
Shixiong Zhu
2015-10-01 17:05 GMT+08:00 Eyad Sibai <eyad.alsi...@gmail.com>:
> Hi
>
> I am trying to call .persist() on a dataframe but once I e
Right, you can use SparkContext and SQLContext in multiple threads. They
are thread safe.
Best Regards,
Shixiong Zhu
2015-10-01 4:57 GMT+08:00 <saif.a.ell...@wellsfargo.com>:
> Hi all,
>
> I have a process where I do some calculations on each one of the columns
> of a datafram
Do you have the log? Looks like some exceptions in your codes make
SparkContext stopped.
Best Regards,
Shixiong Zhu
2015-09-30 17:30 GMT+08:00 tranan <tra...@gmail.com>:
> Hello All,
>
> I have several Spark Streaming applications running on Standalone mode in
> Spark 1.5.
Do you have the log file? It may be because of wrong settings.
Best Regards,
Shixiong Zhu
2015-10-01 7:32 GMT+08:00 markluk <m...@juicero.com>:
> I setup a new Spark cluster. My worker node is dying with the following
> exception.
>
> Caused by: java.util.concurrent.Timeout
I mean JavaSparkContext.setLogLevel. You can use it like this:
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf,
Durations.seconds(2));
jssc.sparkContext().setLogLevel(...);
Best Regards,
Shixiong Zhu
2015-09-29 22:07 GMT+08:00 Ashish Soni <asoni.le...@gmail.com>:
> I
You can use JavaSparkContext.setLogLevel to set the log level in your codes.
Best Regards,
Shixiong Zhu
2015-09-28 22:55 GMT+08:00 Ashish Soni <asoni.le...@gmail.com>:
> I am not running it using spark submit , i am running locally inside
> Eclipse IDE , how i set this usi
Which version are you using? Could you take a look at the new Streaming UI
in 1.4.0?
Best Regards,
Shixiong Zhu
2015-09-29 7:52 GMT+08:00 Siva <sbhavan...@gmail.com>:
> Hi,
>
> Could someone recommend the monitoring tools for spark streaming?
>
> By extending Streamin
enough space.
Best Regards,
Shixiong Zhu
2015-09-29 1:04 GMT+08:00 swetha <swethakasire...@gmail.com>:
>
> Hi,
>
> I see a lot of data getting filled locally as shown below from my streaming
> job. I have my checkpoint set to hdfs. But, I still see the following data
> fi
"count" Spark jobs will run in parallel.
Moreover, "spark.streaming.concurrentJobs" is an internal configuration and
it may be changed in future.
Best Regards,
Shixiong Zhu
2015-09-26 3:34 GMT+08:00 Atul Kulkarni <atulskulka...@gmail.com>:
> Can someone please he
You can change "spark.sql.broadcastTimeout" to increase the timeout. The
default value is 300 seconds.
Best Regards,
Shixiong Zhu
2015-09-24 15:16 GMT+08:00 Eyad Sibai <eyad.alsi...@gmail.com>:
> I am trying to join two tables using dataframes using python 3.4 and I am
>
Looks like you have an incompatible hbase-default.xml in some place. You
can use the following code to find the location of "hbase-default.xml"
println(Thread.currentThread().getContextClassLoader().getResource("hbase-default.xml"))
Best Regards,
Shixiong Zhu
2015-09-21
. RDD.compute: this will run in the executor and the location is not
guaranteed. E.g.,
DStream.foreachRDD(rdd => rdd.foreach { v =>
println(v)
})
"println(v)" is called in the executor.
Best Regards,
Shixiong Zhu
2015-09-17 3:47 GMT+08:00 Renyi Xiong <renyixio...@gmail.com>:
&
Looks like you returns a "Some(null)" in "compute". If you don't want to
create a RDD, it should return None. If you want to return an empty RDD, it
should return "Some(sc.emptyRDD)".
Best Regards,
Shixiong Zhu
2015-09-15 2:51 GMT+08:00 Juan Rodríguez Hortalá <
The folder is in "/tmp" by default. Could you use "df -h" to check the free
space of /tmp?
Best Regards,
Shixiong Zhu
2015-09-05 9:50 GMT+08:00 shenyan zhen <shenya...@gmail.com>:
> Has anyone seen this error? Not sure which dir the program was trying to
> write
(i)
i1.readObject()
Could you provide the "explain" output? It would be helpful to find the
circular references.
Best Regards,
Shixiong Zhu
2015-09-05 0:26 GMT+08:00 Jeff Jones <jjo...@adaptivebiotech.com>:
> We are using Scala 2.11 for a driver program that is running
That's two jobs. `SparkPlan.executeTake` will call `runJob` twice in this
case.
Best Regards,
Shixiong Zhu
2015-08-25 14:01 GMT+08:00 Cheng, Hao hao.ch...@intel.com:
O, Sorry, I miss reading your reply!
I know the minimum tasks will be 2 for scanning, but Jeff is talking about
2 jobs
Hao,
I can reproduce it using the master branch. I'm curious why you cannot
reproduce it. Did you check if the input HadoopRDD did have two partitions?
My test code is
val df = sqlContext.read.json(examples/src/main/resources/people.json)
df.show()
Best Regards,
Shixiong Zhu
2015-08-25 13:01
/org/apache/spark/sql/execution/SparkPlan.scala#L185
Best Regards,
Shixiong Zhu
2015-08-25 8:11 GMT+08:00 Jeff Zhang zjf...@gmail.com:
Hi Cheng,
I know that sqlContext.read will trigger one spark job to infer the
schema. What I mean is DataFrame#show cost 2 spark jobs. So overall it
would cost
file. Could you convert your
data to String using map and use saveAsTextFile or other save methods?
Best Regards,
Shixiong Zhu
2015-08-14 11:02 GMT+08:00 kale 805654...@qq.com:
-
To unsubscribe, e-mail: user-unsubscr
Oh, I see. That's the total time of executing a query in Spark. Then the
difference is reasonable, considering Spark has much more work to do, e.g.,
launching tasks in executors.
Best Regards,
Shixiong Zhu
2015-07-26 16:16 GMT+08:00 Louis Hust louis.h...@gmail.com:
Look at the given url
Could you clarify how you measure the Spark time cost? Is it the total time
of running the query? If so, it's possible because the overhead of
Spark dominates for small queries.
Best Regards,
Shixiong Zhu
2015-07-26 15:56 GMT+08:00 Jerrick Hoang jerrickho...@gmail.com:
how big is the dataset
MemoryStore.ensureFreeSpace for details.
Best Regards,
Shixiong Zhu
2015-07-09 19:17 GMT+08:00 Dibyendu Bhattacharya
dibyendu.bhattach...@gmail.com:
Hi ,
Just would like to clarify few doubts I have how BlockManager behaves .
This is mostly in regards to Spark Streaming Context .
There are two
r1 = context.wholeTextFiles(...)
val r2 = r1.flatMap(s - ...)
r2.persist(StorageLevel.MEMORY)
val r3 = r2.filter(...)...
r3.saveAsTextFile(...)
val r4 = r2.map(...)...
r4.saveAsTextFile(...)
See
http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence
Best Regards,
Shixiong Zhu
DStream must be Serializable, it's metadata checkpointing. But you can use
KryoSerializer for data checkpointing. The data checkpointing uses
RDD.checkpoint which can be set by spark.serializer.
Best Regards,
Shixiong Zhu
2015-07-08 3:43 GMT+08:00 Chen Song chen.song...@gmail.com:
In Spark
Before running your script, could you confirm that
/data/software/spark-1.3.1-bin-2.4.0/applications/pss.am.core-1.0-SNAPSHOT-shaded.jar
exists? You might forget to build this jar.
Best Regards,
Shixiong Zhu
2015-07-06 18:14 GMT+08:00 bit1...@163.com bit1...@163.com:
Hi,
I have following
You can set spark.ui.enabled to false to disable the Web UI.
Best Regards,
Shixiong Zhu
2015-07-06 17:05 GMT+08:00 luohui20...@sina.com:
Hello there,
I heard that there is some way to shutdown Spark WEB UI, is there a
configuration to support this?
Thank you
the communication
between driver and executors? Because this is an ongoing work, there is no
blog now. But you can find more details in this umbrella JIRA:
https://issues.apache.org/jira/browse/SPARK-5293
Best Regards,
Shixiong Zhu
2015-06-10 20:33 GMT+08:00 huangzheng 1106944...@qq.com:
Hi all
You should not call `jssc.stop(true);` in a StreamingListener. It will
cause a dead-lock: `jssc.stop` won't return until `listenerBus` exits. But
since `jssc.stop` blocks `StreamingListener`, `listenerBus` cannot exit.
Best Regards,
Shixiong Zhu
2015-06-04 0:39 GMT+08:00 dgoldenberg dgoldenberg
Cleaner
java.lang.NoClassDefFoundError: 0
at
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:149)
Best Regards,
Shixiong Zhu
2015-06-03 0:08 GMT+08:00 Ryan Williams ryan.blake.willi...@gmail.com:
I think
How about other jobs? Is it an executor log, or a driver log? Could you
post other logs near this error, please? Thank you.
Best Regards,
Shixiong Zhu
2015-06-02 17:11 GMT+08:00 Anders Arpteg arp...@spotify.com:
Just compiled Spark 1.4.0-rc3 for Yarn 2.2 and tried running a job that
worked
`ssc.stop` as a the shutdown hook. But stopGracefully
should be false.
Best Regards,
Shixiong Zhu
2015-05-20 21:59 GMT-07:00 Dibyendu Bhattacharya
dibyendu.bhattach...@gmail.com:
Thanks Tathagata for making this change..
Dibyendu
On Thu, May 21, 2015 at 8:24 AM, Tathagata Das t
Could your provide the full driver log? Looks like a bug. Thank you!
Best Regards,
Shixiong Zhu
2015-05-13 14:02 GMT-07:00 Giovanni Paolo Gibilisco gibb...@gmail.com:
Hi,
I'm trying to run an application that uses a Hive context to perform some
queries over JSON files.
The code
The history server may need several hours to start if you have a lot of
event logs. Is it stuck, or still replaying logs?
Best Regards,
Shixiong Zhu
2015-05-07 11:03 GMT-07:00 Marcelo Vanzin van...@cloudera.com:
Can you get a jstack for the process? Maybe it's stuck somewhere.
On Thu, May 7
SPARK-5522 is really cool. Didn't notice it.
Best Regards,
Shixiong Zhu
2015-05-07 11:36 GMT-07:00 Marcelo Vanzin van...@cloudera.com:
That shouldn't be true in 1.3 (see SPARK-5522).
On Thu, May 7, 2015 at 11:33 AM, Shixiong Zhu zsxw...@gmail.com wrote:
The history server may need several
You are using Scala 2.11 with 2.10 libraries. You can change
org.apache.spark % spark-streaming_2.10 % 1.3.1
to
org.apache.spark %% spark-streaming % 1.3.1
And sbt will use the corresponding libraries according to your Scala
version.
Best Regards,
Shixiong Zhu
2015-05-06 16:21 GMT-07:00
://spark.apache.org/docs/latest/running-on-yarn.html
Best Regards,
Shixiong Zhu
2015-04-30 1:00 GMT-07:00 xiaohe lan zombiexco...@gmail.com:
Hi Madhvi,
If I only install spark on one node, and use spark-submit to run an
application, which are the Worker nodes? Any where are the executors ?
Thanks,
Xiaohe
spark.history.fs.logDirectory is for the history server. For Spark
applications, they should use spark.eventLog.dir. Since you commented out
spark.eventLog.dir, it will be /tmp/spark-events. And this folder does
not exits.
Best Regards,
Shixiong Zhu
2015-04-29 23:22 GMT-07:00 James King jakwebin
The configuration key should be spark.akka.askTimeout for this timeout.
The time unit is seconds.
Best Regards,
Shixiong(Ryan) Zhu
2015-04-26 15:15 GMT-07:00 Deepak Gopalakrishnan dgk...@gmail.com:
Hello,
Just to add a bit more context :
I have done that in the code, but I cannot see it
it from Eclipse on local[*].
On Sun, Apr 19, 2015 at 7:57 PM, Praveen Balaji
secondorderpolynom...@gmail.com wrote:
Thanks Shixiong. I'll try this.
On Sun, Apr 19, 2015, 7:36 PM Shixiong Zhu zsxw...@gmail.com wrote:
The problem is the code you use to test:
sc.parallelize(List(1, 2, 3
The problem is the code you use to test:
sc.parallelize(List(1, 2, 3)).map(throw new
SparkException(test)).collect();
is like the following example:
def foo: Int = Nothing = {
throw new SparkException(test)
}
sc.parallelize(List(1, 2, 3)).map(foo).collect();
So actually the Spark jobs do not
I just checked the codes about creating OutputCommitCoordinator. Could you
reproduce this issue? If so, could you provide details about how to
reproduce it?
Best Regards,
Shixiong(Ryan) Zhu
2015-04-16 13:27 GMT+08:00 Canoe canoe...@gmail.com:
13119 Exception in thread main
: 142905487 ms strings gets printed on console.
No output is getting printed.
And timeinterval between two strings of form ( time:ms)is very less
than Streaming Duration set in program.
On Wed, Apr 15, 2015 at 5:11 AM, Shixiong Zhu zsxw...@gmail.com wrote:
Could you see something like
Could you see something like this in the console?
---
Time: 142905487 ms
---
Best Regards,
Shixiong(Ryan) Zhu
2015-04-15 2:11 GMT+08:00 Shushant Arora shushantaror...@gmail.com:
Hi
I am running a spark
Thanks for the log. It's really helpful. I created a JIRA to explain why it
will happen: https://issues.apache.org/jira/browse/SPARK-6640
However, will this error always happens in your environment?
Best Regards,
Shixiong Zhu
2015-03-31 22:36 GMT+08:00 sparkdi shopaddr1...@dubna.us
Could you paste the whole stack trace here?
Best Regards,
Shixiong Zhu
2015-03-31 2:26 GMT+08:00 sparkdi shopaddr1...@dubna.us:
I have the same problem, i.e. exception with the same call stack when I
start
either pyspark or spark-shell. I use spark-1.3.0-bin-hadoop2.4 on ubuntu
14.10.
bin
LGTM. Could you open a JIRA and send a PR? Thanks.
Best Regards,
Shixiong Zhu
2015-03-28 7:14 GMT+08:00 Manoj Samel manojsamelt...@gmail.com:
I looked @ the 1.3.0 code and figured where this can be added
In org.apache.spark.deploy.yarn ApplicationMaster.scala:282 is
actorSystem
There is no configuration for it now.
Best Regards,
Shixiong Zhu
2015-03-26 7:13 GMT+08:00 Manoj Samel manojsamelt...@gmail.com:
There may be firewall rules limiting the ports between host running spark
and the hadoop cluster. In that case, not all ports are allowed.
Can it be a range
It's a random port to avoid port conflicts, since multiple AMs can run in
the same machine. Why do you need a fixed port?
Best Regards,
Shixiong Zhu
2015-03-26 6:49 GMT+08:00 Manoj Samel manojsamelt...@gmail.com:
Spark 1.3, Hadoop 2.5, Kerbeors
When running spark-shell in yarn client mode
cases are the second one, we set
spark.scheduler.executorTaskBlacklistTime to 3 to solve such No
space left on device errors. So if a task runs unsuccessfully in some
executor, it won't be scheduled to the same executor in 30 seconds.
Best Regards,
Shixiong Zhu
2015-03-16 17:40 GMT+08:00 Jianshi
.
Best Regards,
Shixiong Zhu
2015-03-13 9:37 GMT+08:00 Soila Pertet Kavulya skavu...@gmail.com:
Does Spark support skewed joins similar to Pig which distributes large
keys over multiple partitions? I tried using the RangePartitioner but
I am still experiencing failures because some keys are too
RDD is not thread-safe. You should not use it in multiple threads.
Best Regards,
Shixiong Zhu
2015-02-27 23:14 GMT+08:00 rok rokros...@gmail.com:
I'm seeing this java.util.NoSuchElementException: key not found: exception
pop up sometimes when I run operations on an RDD from multiple threads
Rdd.foreach runs in the executors. You should use `collect` to fetch data
to the driver. E.g.,
myRdd.collect().foreach {
node = {
mp(node) = 1
}
}
Best Regards,
Shixiong Zhu
2015-02-25 4:00 GMT+08:00 Vijayasarathy Kannan kvi...@vt.edu:
Thanks, but it still doesn't seem
The unit of spark.akka.frameSize is MB. The max value is 2047.
Best Regards,
Shixiong Zhu
2015-02-05 1:16 GMT+08:00 sahanbull sa...@skimlinks.com:
I am trying to run a spark application with
-Dspark.executor.memory=30g -Dspark.kryoserializer.buffer.max.mb=2000
-Dspark.akka.frameSize=1
Could you clarify why you need a 10G akka frame size?
Best Regards,
Shixiong Zhu
2015-02-05 9:20 GMT+08:00 Shixiong Zhu zsxw...@gmail.com:
The unit of spark.akka.frameSize is MB. The max value is 2047.
Best Regards,
Shixiong Zhu
2015-02-05 1:16 GMT+08:00 sahanbull sa...@skimlinks.com:
I
It's a bug that has been fixed in https://github.com/apache/spark/pull/4258
but not yet been merged.
Best Regards,
Shixiong Zhu
2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com:
Here is the relevant snippet of code in my main program
It's because you committed the job in Windows to a Hadoop cluster running
in Linux. Spark has not yet supported it. See
https://issues.apache.org/jira/browse/SPARK-1825
Best Regards,
Shixiong Zhu
2015-01-28 17:35 GMT+08:00 Marco marco@gmail.com:
I've created a spark app, which runs fine
`--jars` accepts a comma-separated list of jars. See the usage about
`--jars`
--jars JARS Comma-separated list of local jars to include on the driver and
executor classpaths.
Best Regards,
Shixiong Zhu
2015-01-08 19:23 GMT+08:00 Guillermo Ortiz konstt2...@gmail.com:
I'm trying to execute
. For
me, I will addd -Dhbase.profile=hadoop2 to the build instruction so that
the examples project will use a haoop2-compatible hbase.
Best Regards,
Shixiong Zhu
2015-01-08 0:30 GMT+08:00 Antony Mayi antonym...@yahoo.com.invalid:
thanks, I found the issue, I was including
/usr/lib/spark/lib
call `map(_.toList)` to convert `CompactBuffer` to `List`
Best Regards,
Shixiong Zhu
2015-01-04 12:08 GMT+08:00 Sanjay Subramanian
sanjaysubraman...@yahoo.com.invalid:
hi
Take a look at the code here I wrote
https://raw.githubusercontent.com/sanjaysubramanian/msfx_scala/master/src/main
The Iterable from cogroup is CompactBuffer, which is already materialized.
It's not a lazy Iterable. So now Spark cannot handle skewed data that some
key has too many values that cannot be fit into the memory.
I encountered the following issue when enabling dynamicAllocation. You may
want to take a look at it.
https://issues.apache.org/jira/browse/SPARK-4951
Best Regards,
Shixiong Zhu
2014-12-28 2:07 GMT+08:00 Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com:
Hi Anders,
I faced the same issue as you
Congrats!
A little question about this release: Which commit is this release based
on? v1.2.0 and v1.2.0-rc2 are pointed to different commits in
https://github.com/apache/spark/releases
Best Regards,
Shixiong Zhu
2014-12-19 16:52 GMT+08:00 Patrick Wendell pwend...@gmail.com:
I'm happy
@Rui do you mean the spark-core jar in the maven central repo
are incompatible with the same version of the the official pre-built Spark
binary? That's really weird. I thought they should have used the same codes.
Best Regards,
Shixiong Zhu
2014-12-18 17:22 GMT+08:00 Sean Owen so...@cloudera.com
Could you post the stack trace?
Best Regards,
Shixiong Zhu
2014-12-16 23:21 GMT+08:00 richiesgr richie...@gmail.com:
Hi
This time I need expert.
On 1.1.1 and only in cluster (standalone or EC2)
when I use this code :
countersPublishers.foreachRDD(rdd = {
rdd.foreachPartition
Just point out a bug in your codes. You should not use `mapPartitions` like
that. For details, I recommend Section setup() and cleanup() in Sean
Owen's post:
http://blog.cloudera.com/blog/2014/09/how-to-translate-from-mapreduce-to-apache-spark/
Best Regards,
Shixiong Zhu
2014-12-14 16:35 GMT+08
,
Shixiong Zhu
2014-12-10 20:13 GMT+08:00 Johannes Simon johannes.si...@mail.de:
Hi!
I have been using spark a lot recently and it's been running really well
and fast, but now when I increase the data size, it's starting to run into
problems:
I have an RDD in the form of (String, Iterable[String
Good catch. `Join` should use `Iterator`, too. I open an JIRA here:
https://issues.apache.org/jira/browse/SPARK-4824
Best Regards,
Shixiong Zhu
2014-12-10 21:35 GMT+08:00 Johannes Simon johannes.si...@mail.de:
Hi!
Using an iterator solved the problem! I've been chewing on this for days,
so
What's the status of this application in the yarn web UI?
Best Regards,
Shixiong Zhu
2014-12-05 17:22 GMT+08:00 LinQili lin_q...@outlook.com:
I tried anather test code:
def main(args: Array[String]) {
if (args.length != 1) {
Util.printLog(ERROR, Args error - arg1: BASE_DIR
not send it back to the
client.
spark-submit will return 1 when Yarn reports the ApplicationMaster failed.
Best Regards,
Shixiong Zhu
2014-12-06 1:59 GMT+08:00 LinQili lin_q...@outlook.com:
You mean the localhost:4040 or the application master web ui?
Sent from my iPhone
On Dec 5, 2014, at 17:26
Don't set `spark.akka.frameSize` to 1. The max value of
`spark.akka.frameSize` is 2047. The unit is MB.
Best Regards,
Shixiong Zhu
2014-12-01 0:51 GMT+08:00 Yanbo yanboha...@gmail.com:
Try to use spark-shell --conf spark.akka.frameSize=1
在 2014年12月1日,上午12:25,Brian Dolan buddha_
4096MB is greater than Int.MaxValue and it will be overflow in Spark.
Please set it less then 4096.
Best Regards,
Shixiong Zhu
2014-12-01 13:14 GMT+08:00 Ke Wang jkx...@gmail.com:
I meet the same problem, did you solve it ?
--
View this message in context:
http://apache-spark-user-list
Sorry. Should be not greater than 2048. 2047 is the greatest value.
Best Regards,
Shixiong Zhu
2014-12-01 13:20 GMT+08:00 Shixiong Zhu zsxw...@gmail.com:
4096MB is greater than Int.MaxValue and it will be overflow in Spark.
Please set it less then 4096.
Best Regards,
Shixiong Zhu
2014-12
Created a JIRA to track it: https://issues.apache.org/jira/browse/SPARK-4664
Best Regards,
Shixiong Zhu
2014-12-01 13:22 GMT+08:00 Shixiong Zhu zsxw...@gmail.com:
Sorry. Should be not greater than 2048. 2047 is the greatest value.
Best Regards,
Shixiong Zhu
2014-12-01 13:20 GMT+08:00
: scala.math.BigInt = 100
Best Regards,
Shixiong Zhu
2014-11-25 10:31 GMT+08:00 Peter Thai thai.pe...@gmail.com:
Hello!
Does anyone know why I may be receiving negative final accumulator values?
Thanks!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
to create some
big enough tasks. Of cause, you can reduce `spark.locality.wait`, but it
may be not efficient because it still creates many tiny tasks.
Best Regards,
Shixiong Zhu
2014-11-22 17:17 GMT+08:00 Akhil Das ak...@sigmoidanalytics.com:
What is your cluster setup? are you running a worker
Could you provide the code of hbaseQuery? It maybe doesn't support to
execute in parallel.
Best Regards,
Shixiong Zhu
2014-11-12 14:32 GMT+08:00 qiaou qiaou8...@gmail.com:
Hi:
I got a problem with using the union method of RDD
things like this
I get a function like
def hbaseQuery
1 - 100 of 119 matches
Mail list logo