Here is an example:
|val sc = new SparkContext(new SparkConf)
// access hive tables
val hqlc = new HiveContext(sc)
import hqlc.implicits._
// access files on hdfs
val sqlc = new SQLContext(sc)
import sqlc.implicits._
sqlc.jsonFile(xxx).registerTempTable(xxx)
// access other DB
sqlc.jdbc(url,
Hi,
Some suggestions:
1 You should tell us the version of spark and hive you are using.
2 You shoul paste the full trace stack of the exception.
In this case, I guess you have a nested directory in the path which
|bak_startup_log_uid_20150227| point to.
and the config field
Hi, San
You need to provide more information to diagnose this problem, like :
1. What kind of SQL did you execute?
2. If there are some |group| operation in this SQL, could you do some
statistic about how many unique group keys in this case?
On 1/26/15 17:01, luohui20...@sina.com wrote:
Is there any way to support multiple users executing SQL on one thrift
server?
I think there are some problems for spark 1.2.0, for example:
1. Start thrift server with user A
2. Connect to thrift server via beeline with user B
3. Execute “insert into table dest select … from table src”
then
Could you show your spark version ?
And the value of `spark.default.parallelism` you are setting?
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 20, 2014, at 12:38, Kevin Jung itsjb.j...@samsung.com wrote:
Hi,
I usually use file on hdfs to make PairRDD and analyze it by using
I think you could use `repartition` to make sure there would be no empty
partitions.
You could also try `coalesce` to combine partitions , but it can't make sure
there are no more empty partitions.
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 18, 2014, at 20:30, jan.zi
what is your meaning of executed directly”?
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Oct 16, 2014, at 22:50, neeraj neeraj_gar...@infosys.com wrote:
Hi,
Does Spark SQL have DDL, DML commands to be executed directly. If yes,
please share the link.
If No, please help me
/yarn/stable/src/main/scala” the source path of module “yarn-parent_2.10
7 then you can run Build - Rebuild Project in IDEA.
PS: you should run “rebuild” after you run mvn or sbt command to spark project.
Best Regards,
Yi Tian
tianyi.asiai...@gmail.com
On Sep 28, 2014, at 11:01, maddenpj
“yarn.scheduler.increment-allocation-mb”, default is 1024
it means if you ask 4097mb memory for a container, the resource manager will
create a container which use 5120mb memory.
But I can’t figure out where 5GB come from.
Maybe there are some codes which mistake 1024 and 1000?
Best Regards,
Yi Tian
tianyi.asiai
I think -1 means your application master has not been started yet.
在 2014年8月31日,23:02,Tao Xiao xiaotao.cs@gmail.com 写道:
I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it.
Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit
my job in local mode, Spark
. Meantime we also build innovative big data applications to help our
customer in real time marketing, cross product selling, customer behavior
analysis as well as other areas by using Spark technology.
Yi Tian
11 matches
Mail list logo