Re: spark stream and spark sql with data warehouse

2015-06-11 Thread Yi Tian
Here is an example: |val sc = new SparkContext(new SparkConf) // access hive tables val hqlc = new HiveContext(sc) import hqlc.implicits._ // access files on hdfs val sqlc = new SQLContext(sc) import sqlc.implicits._ sqlc.jsonFile(xxx).registerTempTable(xxx) // access other DB sqlc.jdbc(url,

Re: Exception while select into table.

2015-03-02 Thread Yi Tian
Hi, Some suggestions: 1 You should tell us the version of spark and hive you are using. 2 You shoul paste the full trace stack of the exception. In this case, I guess you have a nested directory in the path which |bak_startup_log_uid_20150227| point to. and the config field

Re: SparkSQL tasks spend too much time to finish.

2015-01-26 Thread Yi Tian
Hi, San You need to provide more information to diagnose this problem, like : 1. What kind of SQL did you execute? 2. If there are some |group| operation in this SQL, could you do some statistic about how many unique group keys in this case? On 1/26/15 17:01, luohui20...@sina.com wrote:

Is there any way to support multiple users executing SQL on thrift server?

2015-01-19 Thread Yi Tian
Is there any way to support multiple users executing SQL on one thrift server? I think there are some problems for spark 1.2.0, for example: 1. Start thrift server with user A 2. Connect to thrift server via beeline with user B 3. Execute “insert into table dest select … from table src” then

Re: default parallelism bug?

2014-10-20 Thread Yi Tian
Could you show your spark version ? And the value of `spark.default.parallelism` you are setting? Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 20, 2014, at 12:38, Kevin Jung itsjb.j...@samsung.com wrote: Hi, I usually use file on hdfs to make PairRDD and analyze it by using

Re: How to not write empty RDD partitions in RDD.saveAsTextFile()

2014-10-20 Thread Yi Tian
I think you could use `repartition` to make sure there would be no empty partitions. You could also try `coalesce` to combine partitions , but it can't make sure there are no more empty partitions. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 18, 2014, at 20:30, jan.zi

Re: Spark SQL DDL, DML commands

2014-10-16 Thread Yi Tian
what is your meaning of executed directly”? Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 16, 2014, at 22:50, neeraj neeraj_gar...@infosys.com wrote: Hi, Does Spark SQL have DDL, DML commands to be executed directly. If yes, please share the link. If No, please help me

Re: Build spark with Intellij IDEA 13

2014-09-28 Thread Yi Tian
/yarn/stable/src/main/scala” the source path of module “yarn-parent_2.10 7 then you can run Build - Rebuild Project in IDEA. PS: you should run “rebuild” after you run mvn or sbt command to spark project. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Sep 28, 2014, at 11:01, maddenpj

Re: Memory used in Spark-0.9.0-incubating

2014-09-25 Thread Yi Tian
“yarn.scheduler.increment-allocation-mb”, default is 1024 it means if you ask 4097mb memory for a container, the resource manager will create a container which use 5120mb memory. But I can’t figure out where 5GB come from. Maybe there are some codes which mistake 1024 and 1000? Best Regards, Yi Tian tianyi.asiai

Re: What does appMasterRpcPort: -1 indicate ?

2014-08-31 Thread Yi Tian
I think -1 means your application master has not been started yet. 在 2014年8月31日,23:02,Tao Xiao xiaotao.cs@gmail.com 写道: I'm using CDH 5.1.0, which bundles Spark 1.0.0 with it. Following How-to: Run a Simple Apache Spark App in CDH 5 , I tried to submit my job in local mode, Spark

Powered By Spark

2014-08-30 Thread Yi Tian
. Meantime we also build innovative big data applications to help our customer in real time marketing, cross product selling, customer behavior analysis as well as other areas by using Spark technology. Yi Tian