proxy on spark UI

2017-06-27 Thread Soheila S.
Hi all,
I am using Hadoop 2.6.5 and spark 2.1.0 and run a job using spark-submit
and master is set to "yarn". When spark starts, I can load Spark UI page
using port 4040 but no job is shown in the page. After the following logs
(registering application master on yarn) spark UI is not accessible
anymore, even from tracking UI (ApplicationMaster) in cluster UI.

The URL (http://z401:4040) is redirected to a new one (
http://z401:8088/proxy/application_1498135277395_0009) and can not be
reached.

Any idea?

Thanks a lot in advance.

17/06/23 12:35:45 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint:
ApplicationMaster registered as NettyRpcEndpointRef(null)

17/06/23 12:35:45 INFO cluster.YarnClientSchedulerBackend: Add WebUI
Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter,
Map(PROXY_HOSTS -> z401, PROXY_URI_BASES ->
http://z401:8088/proxy/application_1498135277395_0009),
/proxy/application_1498135277395_0009

17/06/23 12:35:45 INFO ui.JettyUtils: Adding filter:
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter

17/06/23 12:35:45 INFO yarn.Client: Application report for
application_1498135277395_0009 (state: RUNNING)


Parameter in FlatMap function

2017-04-14 Thread Soheila S.
Hello all,
Can someone help me to solve the following fundamental problem?


I have a JavaRDD and as a flatMap method, I call a new instance of a class
which implements FlatMapFunction. This class has a constructor method and a
call method. In constructor method, I set the values for "List" variables
which I need them in call method.

In local run, there is no problem and it works correctly, but when I run it
on cluster (using YARN) it doesn't know them and the variables are null. (I
get Null pointer exception)

Any solution, idea or hint will be really appreciated,


All the best,
Soheila


Text

2017-01-27 Thread Soheila S.
Hi All,
I read a test file using sparkContext.textfile(filename) and assign it to
an RDD and process the RDD (replace some words) and finally write it to
a text file using rdd.saveAsTextFile(output).
Is there any way to be sure the order of the sentences will not be changed?
I need to have the same text with some corrected words.

thanks!

Soheila


How to tune number of tesks

2017-01-26 Thread Soheila S.
Hi all,

Please tell me how can I tune output partition numbers.
I run my spark job on my local machine with 8 cores and input data is
6.5GB. It creates 193 tasks and put the output into 193 partitions.
How can I change the number of tasks and consequently, the number of output
files?

Best,
Soheila


failed to launch org.apache.spark.deploy.master.Master

2017-01-12 Thread Soheila S.
Hi,
I have executed my spark job using spark-submit on my local machine and on
cluster.
Now I want to try using HDFS. I mean put the data (text file) on hdfs and
read from there, execute the jar file and finally write the output to hdfs.
I got this error after running the job:

*failed to launch org.apache.spark.deploy.master.Master:*
*log is following:*
*Spark Command: /scratch/p_corpus/tools/jdk1.8.0_112/bin/java -cp
$/home/user-folder/cluster-conf-1369394/spark/:/scratch/p_corpus/tools/spark-2.0.1-bin-hadoop2.6/jars/*:/home/user-folder/cluster-conf-1369394/hadoop/:/home/user-folder/cluster-conf-1369394/hadoop/
-Xmx1g org.apache.spark.deploy.master.Master --host
taurusi5551.taurus.hrsk.tu-dresden.de
 --port 7077 --webui-port
8080 /home/user-folder/cluster-conf-1369394/spark*
**
*17/01/12 14:49:32 INFO master.Master: Started daemon with process name:
8524@taurusi5551*
*17/01/12 14:49:32 INFO util.SignalUtils: Registered signal handler for
TERM*
*17/01/12 14:49:32 INFO util.SignalUtils: Registered signal handler for HUP*
*17/01/12 14:49:32 INFO util.SignalUtils: Registered signal handler for INT*
*Usage: Master [options]*

*Options:*
*  -i HOST, --ip HOST Hostname to listen on (deprecated, please use
--host or -h) *
*  -h HOST, --host HOST   Hostname to listen on*
*  -p PORT, --port PORT   Port to listen on (default: 7077)*
*  --webui-port PORT  Port for web UI (default: 8080)*
*  --properties-file FILE Path to a custom Spark properties file.*
* Default is conf/spark-defaults.conf.*

Any help would be really appreciated.

Best,
Soheila


filter RDD by variable

2016-12-07 Thread Soheila S.
Hi
I am new in Spark and have a question in first steps of Spark learning.

How can I filter an RDD using an String variable (for example words[i]) ,
instead of a fix one like "Error"?

Thanks a lot in advance.
Soheila