The thread dump is here, seems hang on accessing mysql meta store.
I googled and find a bug related to com.mysql.jdbc.util.ReadAheadInputStream,
but don't have a workaround.
And I am not sure about that. please help me. thanks.
thread dump---
MyAppDefaultScheduler_Worker-2 prio=10
I think different team got different answer for this question. my team use
scala, and happy with it.
On Wed, Jul 15, 2015 at 1:31 PM, Tristan Blakers tris...@blackfrog.org
wrote:
We have had excellent results operating on RDDs using Java 8 with Lambdas.
It’s slightly more verbose than Scala,
/GaussianMixture.scala
At 2015-07-09 10:10:58, 诺铁 noty...@gmail.com wrote:
thanks, I understand now.
but I can't find mllib.clustering.GaussianMixture#vectorMean , what
version of spark do you use?
On Thu, Jul 9, 2015 at 1:16 AM, Feynman Liang fli...@databricks.com
wrote:
A RDD[Double
hi,
there are some useful functions in DoubleRDDFunctions, which I can use if I
have RDD[Double], eg, mean, variance.
Vector doesn't have such methods, how can I convert Vector to RDD[Double],
or maybe better if I can call mean directly on a Vector?
hi,
now I'm doing something like this on a data frame to make use of table
partitioning
df.filter($sex === male).write.parquet(path/to/table/sex=male)
df.filter($sex === female).write.parquet(path/to/table/sex=female)
this will filter dataset multiple times, are there better way to do this?
hi,
I want to use spark to analyze source code :)
Since code have dependency between lines, it's not possible to just treat
it as lines. So I am considering to provide my own datasource for source
code, but there isn't much documentation about datasource api, where can I
learn to do this?
there is a
*PartitionPruningRDD*
:: DeveloperApi :: A RDD used to prune RDD partitions/partitions so we can
avoid launching tasks on all partitions. An example use case: If we know
the RDD is partitioned by range, and the execution DAG has a filter on the
key, we can avoid launching tasks on
hi,
don't know whether this question should be asked here, if not, please point
me out, thanks.
we are currently using hive on spark, when reading a small int field, it
reports error:
Cannot get field 'i16Val' because union is currently set to i32Val
I googled and find only source code of
to ask (see
https://hive.apache.org/mailing_lists.html).
Thanks,
Yin
On Wed, Nov 26, 2014 at 10:49 PM, 诺铁 noty...@gmail.com wrote:
thank you very much.
On Thu, Nov 27, 2014 at 11:30 AM, Michael Armbrust
mich...@databricks.com wrote:
This has been fixed in Spark 1.1.1 and Spark 1.2
hi,
I am trying to write some unit test, following spark programming guide
http://spark.apache.org/docs/latest/programming-guide.html#unit-testing.
but I observed unit test runs very slow(the code is just a SparkPi), so I
turn log level to trace and look through the log output. and found
I connect my sample project to a hosted CI service, it only takes 3 seconds
to run there...while the same tests takes 2minutes on my macbook pro. so
maybe this is a mac os specific problem?
On Tue, Sep 16, 2014 at 3:06 PM, 诺铁 noty...@gmail.com wrote:
hi,
I am trying to write some unit test
sorry for disturb, please ignore this mail
in the end, I find it slow because lack of memory in my machine..
sorry again.
On Tue, Sep 16, 2014 at 3:26 PM, 诺铁 noty...@gmail.com wrote:
I connect my sample project to a hosted CI service, it only takes 3
seconds to run there...while the same
with the same key in the same partition
}
}
2014-08-11 20:42 GMT+08:00 诺铁 noty...@gmail.com:
hi,
I have googled and find similar question without good answer,
http://stackoverflow.com/questions/24520225/writing-to-hadoop-distributed-file-system-multiple-times-with-spark
in short, I would like
hi,
I have googled and find similar question without good answer,
http://stackoverflow.com/questions/24520225/writing-to-hadoop-distributed-file-system-multiple-times-with-spark
in short, I would like to separate raw data and divide by some key, for
example, create date, and put the in directory
hi, all,
I am playing with docker, trying to create a spark cluster with docker
containers.
since spark master, worker, driver all need to visit each other, I
configured a dns server, and set hostname and domain name of each node.
but when spark master start up, it seems to be using hostname
I haven't seen people write directly to sql database,
mainly because it's difficult to deal with failure,
what if network broken in half of the process? should we drop all data in
database and restart from beginning? if the process is Appending data to
database, then things becomes even complex.
hello,ZhangYi
I find ooyala's opensourced spark-jobserver,
https://github.com/ooyala/spark-jobserver
seems that they are also using akka and spray and spark, maybe helpful for
you.
On Mon, May 5, 2014 at 11:37 AM, ZhangYi yizh...@thoughtworks.com wrote:
Hi all,
Currently, our project is
HI,
I am new to spark,when try to write some simple tests in spark shell, I met
following problem.
I create a very small text file,name it as 5.txt
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
and experiment in spark shell:
scala val d5 = sc.textFile(5.txt).cache()
d5: org.apache.spark.rdd.RDD[String] =
instead:
scala d5.keyBy(_.split( )(0)).mapValues(_.split(
)(1).toInt).reduceByKey((v1, v2) = v1 + v2).collect
On Thu, Apr 17, 2014 at 6:29 PM, 诺铁 noty...@gmail.com wrote:
HI,
I am new to spark,when try to write some simple tests in spark shell, I
met following problem.
I create a very
/YARN/Mesos), output
of println goes to executor stdout.
On Fri, Apr 18, 2014 at 6:53 AM, 诺铁 noty...@gmail.com wrote:
yeah, I got it.!
using println to debug is great for me to explore spark.
thank you very much for your kindly help.
On Fri, Apr 18, 2014 at 12:54 AM, Daniel Darabos
, 诺铁 noty...@gmail.com wrote:
hi,Cheng,
thank you for let me know this. so what do you think is better way to
debug?
On Fri, Apr 18, 2014 at 9:27 AM, Cheng Lian lian.cs@gmail.comwrote:
A tip: using println is only convenient when you are working with local
mode. When running Spark
21 matches
Mail list logo