Reg - Why Apache Hadoop need to be Installed separately for Running Apache Sparkā€¦?

2020-06-22 Thread Praveen Kumar Ramachandran
I'm learning Apache Spark, where I'm trying to run a basic Spark Program
written in Java. I've installed Apache Spark
*(spark-2.4.3-bin-without-hadoop)* downloaded from https://spark.apache.org/
.

I've created a maven project in eclipse and added the following dependency :


  org.apache.spark
  spark-core_2.11
  2.4.3


After building the project, I've tried to run the program by setting
sparkMaster=local through spark config and now I've Encountered with the
following Error :

java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.

After referring to some sites, I've installed hadoop-2.7.7 and added
"HADOOP_HOME" to my .bash_profile.

And I'm able to execute my Spark Program!!


*Now I need to know where and how Hadoop is necessary for Spark??*

I've posted the same in stackoverflow long back, but still can't get a
response.
https://stackoverflow.com/questions/57435163/why-apache-hadoop-need-to-be-installed-for-running-apache-spark

Regards,
Praveen Kumar Ramachandran


is Hadoop need to be installed?

2016-07-31 Thread ayan guha
Hi

I am trying to run spark 2.0 prebuilt with hadoop 2.7 on windows. I do not
have hadoop installed as I wanted to test spark alone.

When I run pyspark it does start up, but reading any file using dataframe
APIs fail. I recall it was doable in earlier versions of spark, but is it
something not possible anymore?

[image: Inline image 1]


-- 
Best Regards,
Ayan Guha