SparkStraming job break with shuffle file not found

2018-03-28 Thread Jone Zhang
The spark streaming job running for a few days,then fail as below What is the possible reason? *18/03/25 07:58:37 ERROR yarn.ApplicationMaster: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 16 in stage 80018.0 failed 4 times, most recent

Wish you give our product a wonderful name

2017-09-08 Thread Jone Zhang
We have built an an ml platform, based on open source framework like hadoop, spark, tensorflow. Now we need to give our product a wonderful name, and eager for everyone's advice. Any answers will be greatly appreciated. Thanks.

How can i split dataset to multi dataset

2017-08-06 Thread Jone Zhang
val schema = StructType( Seq( StructField("app", StringType, nullable = true), StructField("server", StringType, nullable = true), StructField("file", StringType, nullable = true), StructField("...", StringType, nullable = true) ) ) val row =

Can i move TFS and TSFT out of spark package

2017-07-26 Thread Jone Zhang
I have build the spark-assembly-1.6.0-hadoop2.5.1.jar cat spark-assembly-1.6.0-hadoop2.5.1.jar/META-INF/services/org. apache.hadoop.fs.FileSystem ... org.apache.hadoop.hdfs.DistributedFileSystem org.apache.hadoop.hdfs.web.HftpFileSystem org.apache.hadoop.hdfs.web.HsftpFileSystem

Re: Why spark.sql.autoBroadcastJoinThreshold not available

2017-05-15 Thread Jone Zhang
in the Spark 2.x. Can you try it on Spark 2.0? > > Yong > > -- > *From:* Jone Zhang <joyoungzh...@gmail.com> > *Sent:* Wednesday, May 10, 2017 7:10 AM > *To:* user @spark/'user @spark'/spark users/user@spark > *Subject:* Why spark.sql.autoBroadcastJ

How can i merge multiple rows to one row in sparksql or hivesql?

2017-05-15 Thread Jone Zhang
For example Data1(has 1 billion records) user_id1 feature1 user_id1 feature2 Data2(has 1 billion records) user_id1 feature3 Data3(has 1 billion records) user_id1 feature4 user_id1 feature5 ... user_id1 feature100 I want to get the result as follow user_id1 feature1 feature2 feature3

Why spark.sql.autoBroadcastJoinThreshold not available

2017-05-10 Thread Jone Zhang
Now i use spark1.6.0 in java I wish the following sql to be executed in BroadcastJoin way *select * from sample join feature* This is my step 1.set spark.sql.autoBroadcastJoinThreshold=100M 2.HiveContext.sql("cache lazy table feature as "select * from src where ...) which result size is only 100K

org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated

2017-05-05 Thread Jone Zhang
*When i use sparksql, the error as follows* 17/05/05 15:58:44 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 20.0 (TID 4080, 10.196.143.233): java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem: Provider tachyon.hadoop.TFS could not be instantiated at

Why chinese character gash appear when i use spark textFile?

2017-04-05 Thread Jone Zhang
var textFile = sc.textFile("xxx"); textFile.first(); res1: String = 1.0 100733314 18_?:100733314 8919173c6d49abfab02853458247e5841:129:18_?:1.0 hadoop fs -cat xxx 1.0100733314 18_百度输入法:100733314 8919173c6d49abfab02853458247e584 1:129:18_百度输入法:1.0 Why

Is there length limit for sparksql/hivesql?

2016-10-26 Thread Jone Zhang
Is there length limit for sparksql/hivesql? Can antlr work well if sql is too long? Thanks.

Can i display message on console when use spark on yarn?

2016-10-20 Thread Jone Zhang
I submit spark with "spark-submit --master yarn-cluster --deploy-mode cluster" How can i display message on yarn console. I expect it to be like this: . 16/10/20 17:12:53 main INFO org.apache.spark.deploy.yarn.Client>SPK> Application report for application_1453970859007_481440 (state:

Re: High virtual memory consumption on spark-submit client.

2016-05-13 Thread jone
011856   16534452Swap:  2031608    304    2031304Total:    26577916   24269064    2308852 Dr Mich Talebzadeh   LinkedIn  https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw   http://talebzadehmich.wordpress.com   On 13 May 2016 at 07:36, Jone

High virtual memory consumption on spark-submit client.

2016-05-12 Thread jone
The virtual memory is 9G When i run org.apache.spark.examples.SparkPi under yarn-cluster model,which using default configurations.   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND