Why chinese character gash appear when i use spark textFile?

2017-04-05 Thread JoneZhang
var textFile = sc.textFile("xxx"); textFile.first(); res1: String = 1.0 862910025238798 100733314 18_?:100733314 8919173c6d49abfab02853458247e5841:129:18_?:1.0 hadoop fs -cat xxx 1.0 862910025238798 100733314 18_百度输入法:100733314

Does parallelize and collect preserve the original order of list?

2016-03-15 Thread JoneZhang
Step1 List items = new ArrayList();items.addAll(XXX); javaSparkContext.parallelize(items).saveAsTextFile(output); Step2 final List items2 = ctx.textFile(output).collect(); Does items and items2 has the same order? Besh wishes. Thanks. -- View this message

What would happen when reduce memory is not enough on spark shuffle read stage?

2015-11-09 Thread JoneZhang
for example if the data size of shuffle read is 1T and the number of reduce is 200, it means one reduce needs to fetch 1T/200=5G data. if the total memory of one reduce is only 4G, what would happen? -- View this message in context:

I don't understand what this sentence means."7.1 GB of 7 GB physical memory used"

2015-10-23 Thread JoneZhang
Here is the spark configure and error log spark.dynamicAllocation.enabled true spark.shuffle.service.enabled true spark.dynamicAllocation.minExecutors10 spark.executor.cores1 spark.executor.memory 6G

Whether Spark will use disk when the memory is not enough on MEMORY_ONLY Storage Level

2015-10-22 Thread JoneZhang
1.Whether Spark will use disk when the memory is not enough on MEMORY_ONLY Storage Level? 2.If not, How can i set Storage Level when i use Hive on Spark? 3.Do Spark have any intention of dynamically determined Hive on MapReduce or Hive on Spark, base on SQL features. Thanks in advance Best

Do you have any other method to get cpu elapsed time of an spark application

2015-08-11 Thread JoneZhang
Is there more information about spark evenlog, for example Why did not appear the SparkListenerExecutorRemoved event in evenlog while i use dynamic executor? I want to calculate cpu elapsed time of an application base on evenlog. By the way, Do you have any other method to get cpu elapsed time

How to deal with the spark streaming application while upgrade spark

2015-07-23 Thread JoneZhang
My spark streaming on kafka application is running in spark 1.3. I want upgrade spark to 1.4 now. How to deal with the spark streaming application? Save the kafka topic partition offset, then kill the application, then upgrade, then run spark streaming again? Is there more elegant way? -- View

Is there more information about spark shuffer-service

2015-07-21 Thread JoneZhang
There is a saying If the service is enabled, Spark executors will fetch shuffle files from the service instead of from each other. in the wiki https://spark.apache.org/docs/1.3.0/job-scheduling.html#graceful-decommission-of-executors

Re: it seem like the exactly once feature not work on spark1.4

2015-07-17 Thread JoneZhang
I see now. There are three steps in SparkStreaming + Kafka date processing 1.Receiving the data 2.Transforming the data 3.Pushing out the data SparkStreaming + Kafka only provide an exactly-once guarantee on step 1 2 We need to ensure exactly once on step 3 by myself. More details see base