Apache Spark and Airflow connection
I have a docker based cluster. In my cluster, I try to schedule spark jobs by using Airflow. Airflow and Spark are running separately in *different containers*. However, I cannot run a spark job by using airflow. Below the code is my airflow script: from airflow import DAG from airflow.contrib.operators.spark_submit_operator import SparkSubmitOperator from datetime import datetime, timedelta args = {'owner': 'airflow', 'start_date': datetime(2018, 7, 31) } dag = DAG('spark_example_new', default_args=args, schedule_interval="@once") operator = SparkSubmitOperator(task_id='spark_submit_job', conn_id='spark_default', java_class='Main', application='/SimpleSpark.jar', name='airflow-spark-example', dag=dag) I also configure spark_default in Airflow UI: [image: Screenshot from 2018-09-24 12-00-46.png] However, it produce following error: [Errno 2] No such file or directory: 'spark-submit': 'spark-submit' I think, airflow try to run spark job in own. How can I configure that it runs spark code on spark master. -- Uğur Sopaoğlu
Apache Spark Cluster
We try to create a cluster which consists of 4 machines. The cluster will be used by multiple-users. How can we configured that user can submit jobs from personal computer and is there any free tool you can suggest to leverage procedure. -- Uğur Sopaoğlu
Re: Launch a pyspark Job From UI
Dear Hemant, I have built spark cluster by using docker container. Can I use apache livy to submit a job to master node? hemant singh şunları yazdı (11 Haz 2018 13:55): > You can explore Livy https://dzone.com/articles/quick-start-with-apache-livy > >> On Mon, Jun 11, 2018 at 3:35 PM, srungarapu vamsi >> wrote: >> Hi, >> >> I am looking for applications where we can trigger spark jobs from UI. >> Are there any such applications available? >> >> I have checked Spark-jobserver using which we can expose an api to submit a >> spark application. >> >> Are there any other alternatives using which i can submit pyspark jobs from >> UI ? >> >> Thanks, >> Vamsi >
Re: Write to HDFS
Actually, when I run following code, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) It save the results into more than one partition like part-0, part-1. I want to collect all of them into one file. 2017-10-20 16:43 GMT+03:00 Marco Mistroni <mmistr...@gmail.com>: > Hi > Could you just create an rdd/df out of what you want to save and store it > in hdfs? > Hth > > On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <usopao...@gmail.com> wrote: > >> Hi all, >> >> In word count example, >> >> val textFile = sc.textFile("Sample.txt") >> val counts = textFile.flatMap(line => line.split(" ")) >> .map(word => (word, 1)) >> .reduceByKey(_ + _) >> counts.saveAsTextFile("hdfs://master:8020/user/abc") >> >> I want to write collection of "*counts" *which is used in code above to >> HDFS, so >> >> val x = counts.collect() >> >> Actually I want to write *x *to HDFS. But spark wants to RDD to write >> sometihng to HDFS >> >> How can I write Array[(String,Int)] to HDFS >> >> >> -- >> Uğur >> > -- Uğur Sopaoğlu
Write to HDFS
Hi all, In word count example, val textFile = sc.textFile("Sample.txt") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://master:8020/user/abc") I want to write collection of "*counts" *which is used in code above to HDFS, so val x = counts.collect() Actually I want to write *x *to HDFS. But spark wants to RDD to write sometihng to HDFS How can I write Array[(String,Int)] to HDFS -- Uğur
jar file problem
Hello, I have a very easy problem. How I run a spark job, I must copy jar file to all worker nodes. Is there any way to do simple?. -- Uğur Sopaoğlu
Part-time job
Hi all, I have been working with Spark for about 8 months. But it is not fully learned by self-study. So I want to take a part-time job on a project. Thus, I believe that it will both contribute to my own development and benefit others. I *do not have any salary* anticipation. Can you help me? Thanks -- Uğur Sopaoğlu