Apache Spark and Airflow connection

2018-09-24 Thread Uğur Sopaoğlu
I have a docker based cluster. In my cluster, I try to schedule spark jobs
by using Airflow. Airflow and Spark are running separately in *different
containers*.  However, I cannot run a spark job by using airflow.

Below the code is my airflow script:

from airflow import DAG

from airflow.contrib.operators.spark_submit_operator import
SparkSubmitOperator
from datetime import datetime, timedelta


args = {'owner': 'airflow', 'start_date': datetime(2018, 7, 31) }

dag = DAG('spark_example_new', default_args=args, schedule_interval="@once")

operator = SparkSubmitOperator(task_id='spark_submit_job',
conn_id='spark_default', java_class='Main', application='/SimpleSpark.jar',
name='airflow-spark-example',
dag=dag)

I also configure spark_default in Airflow UI:

[image: Screenshot from 2018-09-24 12-00-46.png]


However, it produce following error:

[Errno 2] No such file or directory: 'spark-submit': 'spark-submit'

I think, airflow try to run spark job in own. How can I configure that it
runs spark code on spark master.

-- 
Uğur Sopaoğlu


Apache Spark Cluster

2018-07-23 Thread Uğur Sopaoğlu
We try to create a cluster which consists of 4 machines. The cluster will
be used by multiple-users. How can we configured that user can submit jobs
from personal computer and is there any free tool you can suggest to
leverage procedure.
-- 
Uğur Sopaoğlu


Re: Launch a pyspark Job From UI

2018-06-11 Thread uğur sopaoğlu
Dear Hemant,

I have built spark cluster by using docker container. Can I use apache livy to 
submit a job to master node?

hemant singh  şunları yazdı (11 Haz 2018 13:55):

> You can explore Livy https://dzone.com/articles/quick-start-with-apache-livy
> 
>> On Mon, Jun 11, 2018 at 3:35 PM, srungarapu vamsi  
>> wrote:
>> Hi,
>> 
>> I am looking for applications where we can trigger spark jobs from UI.
>> Are there any such applications available?
>> 
>> I have checked Spark-jobserver using which we can expose an api to submit a 
>> spark application.
>> 
>> Are there any other alternatives using which i can submit pyspark jobs from 
>> UI ?
>> 
>> Thanks,
>> Vamsi
> 


Re: Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu
Actually, when I run following code,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
 .map(word => (word, 1))
 .reduceByKey(_ + _)


It save the results into more than one partition like part-0,
part-1. I want to collect all of them into one file.


2017-10-20 16:43 GMT+03:00 Marco Mistroni <mmistr...@gmail.com>:

> Hi
>  Could you just create an rdd/df out of what you want to save and store it
> in hdfs?
> Hth
>
> On Oct 20, 2017 9:44 AM, "Uğur Sopaoğlu" <usopao...@gmail.com> wrote:
>
>> Hi all,
>>
>> In word count example,
>>
>>   val textFile = sc.textFile("Sample.txt")
>>   val counts = textFile.flatMap(line => line.split(" "))
>>  .map(word => (word, 1))
>>  .reduceByKey(_ + _)
>>  counts.saveAsTextFile("hdfs://master:8020/user/abc")
>>
>> I want to write collection of "*counts" *which is used in code above to
>> HDFS, so
>>
>> val x = counts.collect()
>>
>> Actually I want to write *x *to HDFS. But spark wants to RDD to write
>> sometihng to HDFS
>>
>> How can I write Array[(String,Int)] to HDFS
>>
>>
>> --
>> Uğur
>>
>


-- 
Uğur Sopaoğlu


Write to HDFS

2017-10-20 Thread Uğur Sopaoğlu
Hi all,

In word count example,

  val textFile = sc.textFile("Sample.txt")
  val counts = textFile.flatMap(line => line.split(" "))
 .map(word => (word, 1))
 .reduceByKey(_ + _)
 counts.saveAsTextFile("hdfs://master:8020/user/abc")

I want to write collection of "*counts" *which is used in code above to
HDFS, so

val x = counts.collect()

Actually I want to write *x *to HDFS. But spark wants to RDD to write
sometihng to HDFS

How can I write Array[(String,Int)] to HDFS


-- 
Uğur


jar file problem

2017-10-19 Thread Uğur Sopaoğlu
Hello,

I have a very easy problem. How I run a spark job, I must copy jar file to
all worker nodes. Is there any way to do simple?.

-- 
Uğur Sopaoğlu


Part-time job

2017-09-08 Thread Uğur Sopaoğlu
Hi  all,

I have been working with Spark for about 8 months. But it is not fully
learned by self-study.  So I want to take a part-time job on a project.
Thus, I believe that it will both contribute to my own development and
benefit others.  I *do not have any salary* anticipation.

Can you help me?

Thanks

-- 
Uğur Sopaoğlu