proxy on spark UI

2017-06-27 Thread Soheila S.
Hi all, I am using Hadoop 2.6.5 and spark 2.1.0 and run a job using spark-submit and master is set to "yarn". When spark starts, I can load Spark UI page using port 4040 but no job is shown in the page. After the following logs (registering application master on yarn) spark UI is not accessible

Parameter in FlatMap function

2017-04-14 Thread Soheila S.
Hello all, Can someone help me to solve the following fundamental problem? I have a JavaRDD and as a flatMap method, I call a new instance of a class which implements FlatMapFunction. This class has a constructor method and a call method. In constructor method, I set the values for "List"

Text

2017-01-27 Thread Soheila S.
Hi All, I read a test file using sparkContext.textfile(filename) and assign it to an RDD and process the RDD (replace some words) and finally write it to a text file using rdd.saveAsTextFile(output). Is there any way to be sure the order of the sentences will not be changed? I need to have the

How to tune number of tesks

2017-01-26 Thread Soheila S.
Hi all, Please tell me how can I tune output partition numbers. I run my spark job on my local machine with 8 cores and input data is 6.5GB. It creates 193 tasks and put the output into 193 partitions. How can I change the number of tasks and consequently, the number of output files? Best,

failed to launch org.apache.spark.deploy.master.Master

2017-01-12 Thread Soheila S.
Hi, I have executed my spark job using spark-submit on my local machine and on cluster. Now I want to try using HDFS. I mean put the data (text file) on hdfs and read from there, execute the jar file and finally write the output to hdfs. I got this error after running the job: *failed to launch

filter RDD by variable

2016-12-07 Thread Soheila S.
Hi I am new in Spark and have a question in first steps of Spark learning. How can I filter an RDD using an String variable (for example words[i]) , instead of a fix one like "Error"? Thanks a lot in advance. Soheila