/WrappedArraySerializer.scala
Cheers
On Sat, Feb 14, 2015 at 6:36 PM, Tao Xiao xiaotao.cs@gmail.com
wrote:
I'm using Spark 1.1.0 and find that *ImmutableBytesWritable* can be
serialized by Kryo but *Array[ImmutableBytesWritable] *can't be
serialized even when I registered both of them in Kryo.
The code
I'm using Spark 1.1.0 and find that *ImmutableBytesWritable* can be
serialized by Kryo but *Array[ImmutableBytesWritable] *can't be serialized
even when I registered both of them in Kryo.
The code is as follows:
val conf = new SparkConf()
.setAppName(Hello Spark)
I'm using CDH 5.1.0 and Spark 1.0.0, and I'd like to write out data
as snappy-compressed files but encounted a problem.
My code is as follows:
val InputTextFilePath = hdfs://ec2.hadoop.com:8020/xt/text/new.txt
val OutputTextFilePath = hdfs://ec2.hadoop.com:8020/xt/compressedText/
val
Hi all,
I tested *partitionBy *feature in wordcount application, and I'm
puzzled by a phenomenon. In this application, I created an rdd from some
text files in HDFS(about 100GB in size), each of which has lines composed
of words separated by a character #. I wanted to count the occurence for
a kill link. You can try using that.
Best Regards,
Sonal
Founder, Nube Technologies http://www.nubetech.co
http://in.linkedin.com/in/sonalgoyal
On Tue, Nov 11, 2014 at 7:28 PM, Tao Xiao xiaotao.cs@gmail.com
wrote:
I'm using Spark 1.0.0 and I'd like to kill a job running in cluster
I'm using Spark 1.0.0 and I'd like to kill a job running in cluster mode,
which means the driver is not running on local node.
So how can I kill such a job? Is there a command like hadoop job -kill
job-id which kills a running MapReduce job ?
Thanks
could just sleep a few seconds before run the
job. or there are some patches related and providing other way to sync
executors status before running applications, but I haven’t track the
related status for a while.
Raymond
On 2014年10月20日, at 上午11:22, Tao Xiao xiaotao.cs@gmail.com wrote:
Hi
Hi all,
I have a Spark-0.9 cluster, which has 16 nodes.
I wrote a Spark application to read data from an HBase table, which has 86
regions spreading over 20 RegionServers.
I submitted the Spark app in Spark standalone mode and found that there
were 86 executors running on just 3 nodes and it
...@sigmoidanalytics.com:
Adding your application jar to the sparkContext will resolve this issue.
Eg:
sparkContext.addJar(./target/scala-2.10/myTestApp_2.10-1.0.jar)
Thanks
Best Regards
On Mon, Oct 13, 2014 at 8:42 AM, Tao Xiao xiaotao.cs@gmail.com
wrote:
In the beginning I tried to read HBase
Hi all,
I'm using CDH 5.0.1 (Spark 0.9) and submitting a job in Spark Standalone
Cluster mode.
The job is quite simple as follows:
object HBaseApp {
def main(args:Array[String]) {
testHBase(student, /test/xt/saveRDD)
}
def testHBase(tableName: String, outFile:String) {
did not change the object name and file name.
2014-10-13 0:00 GMT+08:00 Ted Yu yuzhih...@gmail.com:
Your app is named scala.HBaseApp
Does it read / write to HBase ?
Just curious.
On Sun, Oct 12, 2014 at 8:00 AM, Tao Xiao xiaotao.cs@gmail.com
wrote:
Hi all,
I'm using CDH 5.0.1
to submit my job can be seen in my second post. Please refer
to that.
2014-10-08 13:44 GMT+08:00 Sean Owen so...@cloudera.com:
How did you run your program? I don't see from your earlier post that
you ever asked for more executors.
On Wed, Oct 8, 2014 at 4:29 AM, Tao Xiao xiaotao.cs
just read splits from HBase so would not make sense to determine it by
something that the first line of the program does.
On Oct 8, 2014 8:00 AM, Tao Xiao xiaotao.cs@gmail.com wrote:
Hi Sean,
Do I need to specify the number of executors when submitting the job?
I suppose the number
, Oct 1, 2014 at 8:17 AM, Tao Xiao xiaotao.cs@gmail.com
wrote:
I can submit a MapReduce job reading that table, although its
processing rate is also a litter slower than I expected, but not that slow
as Spark.
?
This would show whether the slowdown is in HBase code or somewhere else.
Cheers
On Mon, Sep 29, 2014 at 11:40 PM, Tao Xiao xiaotao.cs@gmail.com
wrote:
I checked HBase UI. Well, this table is not completely evenly spread
across the nodes, but I think to some extent it can be seen
can
not achieve high level of parallelism unless you have 5-10 regions per RS
at least. What does it mean? You probably have too few regions. You can
verify that in HBase Web UI.
-Vladimir Rodionov
On Mon, Sep 29, 2014 at 7:21 PM, Tao Xiao xiaotao.cs@gmail.com
wrote:
I submitted a job
I submitted a job in Yarn-Client mode, which simply reads from a HBase
table containing tens of millions of records and then does a *count *action.
The job runs for a much longer time than I expected, so I wonder whether it
was because the data to read was too much. Actually, there are 20 nodes in
()
.setAppName( Reading HBase )
val sc = new SparkContext(sparkConf)
val rdd = sc.newAPIHadoopRDD(hbaseConf, classOf[TableInputFormat],
classOf[ImmutableBytesWritable], classOf[Result])
println(rdd.count)
}
}
2014-09-30 10:21 GMT+08:00 Tao Xiao xiaotao.cs
Hi ,
I have the following rdd :
val conf = new SparkConf()
.setAppName( Testing Sorting )
val sc = new SparkContext(conf)
val L = List(
(new Student(XiaoTao, 80, 29), I'm Xiaotao),
(new Student(CCC, 100, 24), I'm CCC),
(new Student(Jack, 90, 25), I'm Jack),
@ Tokyo
On Mon, Sep 15, 2014 at 3:06 PM, Tao Xiao xiaotao.cs@gmail.com
wrote:
I followd an example presented in the tutorial Learning Spark
http://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html
to compute the per-key average as follows:
val Array(appName
I found the answer. Here the file system of the checkpoint should be a
fault-tolerant file system like HDFS, so we should set it to a HDFS path.
It is not a local file system path.
2014-09-03 10:28 GMT+08:00 Tao Xiao xiaotao.cs@gmail.com:
I tried to run KafkaWordCount in a Spark
I tried to run KafkaWordCount in a Spark standalone cluster. In this
application, the checkpoint directory was set as follows :
val sparkConf = new SparkConf().setAppName(KafkaWordCount)
val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint(checkpoint)
After
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to
submit my job in local mode, Spark Standalone mode and YARN mode. I
successfully submitted my job in local mode and Standalone mode, however, I
noticed the following
(appMasterRpcPort: 0).
2014-08-31 23:10 GMT+08:00 Yi Tian tianyi.asiai...@gmail.com:
I think -1 means your application master has not been started yet.
在 2014年8月31日,23:02,Tao Xiao xiaotao.cs@gmail.com 写道:
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run a Simple Apache Spark App in CDH 5
http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/
, I tried to submit my job in local mode, Spark Standalone mode and YARN
mode. I successfully
Reading about RDD Persistency
https://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence,
I
learned that the storage level MEMORY_AND_DISK means that Store RDD as
deserialized Java objects in the JVM. If the RDD does not fit in memory,
store the partitions that don't fit on disk,
I am using Spark 0.9
I have an array of tuples, and I want to sort these tuples using the *sortByKey
*API as follows in Spark shell:
val A:Array[(String, String)] = Array((1, One), (9, Nine), (3,
three), (5, five), (4, four))
val P = sc.parallelize(A)
// MyComparator is an example, maybe I have
the driver,
write it to disk in that case.
Scala
Mayur Rustagi
Ph: +919632149971
h https://twitter.com/mayur_rustagittp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi
On Tue, Feb 25, 2014 at 1:19 AM, Tao Xiao xiaotao.cs@gmail.comwrote:
I am a newbie to Spark and I
28 matches
Mail list logo