RE: Spark on Kubernetes : unable to write files to HDFS

2020-12-16 Thread Loic DESCOTTE
Everything is working fine now 
Thanks again

Loïc

De : German Schiavon 
Envoyé : mercredi 16 décembre 2020 19:23
À : Loic DESCOTTE 
Cc : user@spark.apache.org 
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

We all been there! no reason to be ashamed :)

On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
Oh thank you you're right!! I feel shameful 


De : German Schiavon mailto:gschiavonsp...@gmail.com>>
Envoyé : mercredi 16 décembre 2020 18:01
À : Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>>
Cc : user@spark.apache.org 
mailto:user@spark.apache.org>>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds

  
data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
So I've tried several other things, including building a fat jar with hdfs 
dependency inside my app jar, and added this to the Spark configuration in the 
code :

val spark = SparkSession
  .builder()
  .appName("Hello Spark 7")
  .config("fs.hdfs.impl", 
classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
  .getOrCreate()


But still the same error...


De : Sean Owen mailto:sro...@gmail.com>>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

I think it'll have to be part of the Spark distro, but I'm not 100% sure. I 
also think these get registered via manifest files in the JARs; if some process 
is stripping those when creating a bundled up JAR, could be it. Could be that 
it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
I've tried with this spark-submit option :

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen mailto:sro...@gmail.com>>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to 
write data on HDFS : "no filesystem for scheme hdfs"

More details :

I am submitting my application with Spark submit like this :

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 
hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this :

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on 
HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark 
submit command:

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession


Find difference between two dataframes in spark structured streaming

2020-12-16 Thread act_coder
I am creating a spark structured streaming job, where I need to find the
difference between two dataframes.

Dataframe 1 :

[1, item1, value1]
[2, item2, value2]
[3, item3, value3]
[4, item4, value4]
[5, item5, value5]

Dataframe 2:

[4, item4, value4]
[5, item5, value5]

New Dataframe with difference between two D1-D2:

[1, item1, value1]
[2, item2, value2]
[3, item3, value3]

I tried using except() and left anti join(), but both are not being
supported on spark structured streaming.

Is there a way we can achieve this in structured streaming ?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark on Kubernetes : unable to write files to HDFS

2020-12-16 Thread German Schiavon
We all been there! no reason to be ashamed :)

On Wed, 16 Dec 2020 at 18:14, Loic DESCOTTE <
loic.desco...@kaizen-solutions.net> wrote:

> Oh thank you you're right!! I feel shameful 
>
> --
> *De :* German Schiavon 
> *Envoyé :* mercredi 16 décembre 2020 18:01
> *À :* Loic DESCOTTE 
> *Cc :* user@spark.apache.org 
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> Hi,
>
> seems that you have a typo no?
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> hfds
>
>   data.write.mode("overwrite").format("text").save("hfds://
> hdfs-namenode/user/loic/result.txt")
>
>
> On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> So I've tried several other things, including building a fat jar with hdfs
> dependency inside my app jar, and added this to the Spark configuration in
> the code :
>
> val spark = SparkSession
>   .builder()
>   .appName("Hello Spark 7")
>   .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.
> DistributedFileSystem].getName)
>   .getOrCreate()
>
>
> But still the same error...
>
> --
> *De :* Sean Owen 
> *Envoyé :* mercredi 16 décembre 2020 14:27
> *À :* Loic DESCOTTE 
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> I think it'll have to be part of the Spark distro, but I'm not 100% sure.
> I also think these get registered via manifest files in the JARs; if some
> process is stripping those when creating a bundled up JAR, could be it.
> Could be that it's failing to initialize too for some reason.
>
> On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> I've tried with this spark-submit option :
>
> --packages
> org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \
>
> But it did't solve the issue.
> Should I add more jars?
>
> Thanks
> Loïc
> --
> *De :* Sean Owen 
> *Envoyé :* mercredi 16 décembre 2020 14:20
> *À :* Loic DESCOTTE 
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> Seems like your Spark cluster doesn't somehow have the Hadoop JARs?
>
> On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> Hello,
>
> I am using Spark On Kubernetes and I have the following error when I try
> to write data on HDFS : "no filesystem for scheme hdfs"
>
> More details :
>
> I am submitting my application with Spark submit like this :
>
> spark-submit --master k8s://https://myK8SMaster:6443 \
> --deploy-mode cluster \
> --name hello-spark \
> --class Hello \
> --conf spark.executor.instances=2 \
> --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.container.image=gradiant/spark:2.4.4
> hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar
>
> Then the driver and the 2 executors are created in K8S.
>
> But it fails when I look at the logs of the driver, I see this :
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> hfds
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at
> org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
> at
> org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
> at
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
> at Hello$.main(hello.scala:24)
> at Hello.main(hello.scala)
>
>
> As you can see , my application jar helloSpark.jar file is correctly
> loaded on HDFS by the Spark submit, but writing to HDFS fails.
>
> I have also tried to add the hadoop client dand hdfs dependencies in the
> spark submit command:
>
> --packages
> org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \
>
> But the error is still here.
>
>
> Here is the Scala code of my application :
>
>
> import java.util.Calendar
>
> import org.apache.spark.sql.SparkSession
>
> case class Data(singleField: String)
>
> object Hello
> {
> def main(args: Array[String])
> {
>
> val spark = SparkSession
>   .builder()
>   .appName("Hello Spark")
>   .getOrCreate()
>
> import spark.implicits._
>
> val now = 

RE: Spark on Kubernetes : unable to write files to HDFS

2020-12-16 Thread Loic DESCOTTE
Oh thank you you're right!! I feel shameful ??


De : German Schiavon 
Envoyé : mercredi 16 décembre 2020 18:01
À : Loic DESCOTTE 
Cc : user@spark.apache.org 
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds

  
data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
So I've tried several other things, including building a fat jar with hdfs 
dependency inside my app jar, and added this to the Spark configuration in the 
code :

val spark = SparkSession
  .builder()
  .appName("Hello Spark 7")
  .config("fs.hdfs.impl", 
classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
  .getOrCreate()


But still the same error...


De : Sean Owen mailto:sro...@gmail.com>>
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

I think it'll have to be part of the Spark distro, but I'm not 100% sure. I 
also think these get registered via manifest files in the JARs; if some process 
is stripping those when creating a bundled up JAR, could be it. Could be that 
it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
I've tried with this spark-submit option :

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen mailto:sro...@gmail.com>>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to 
write data on HDFS : "no filesystem for scheme hdfs"

More details :

I am submitting my application with Spark submit like this :

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 
hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this :

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on 
HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark 
submit command:

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
def main(args: Array[String])
{

val spark = SparkSession
  .builder()
  .appName("Hello Spark")
  .getOrCreate()

import spark.implicits._

val now = Calendar.getInstance().getTime().toString
val data = List(Data(now)).toDF()

data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
}
}

Thanks for your help,
Loïc


Re: Spark on Kubernetes : unable to write files to HDFS

2020-12-16 Thread German Schiavon
Hi,

seems that you have a typo no?

Exception in thread "main" java.io.IOException: No FileSystem for scheme:
hfds

  data.write.mode("overwrite").format("text").save("hfds://
hdfs-namenode/user/loic/result.txt")


On Wed, 16 Dec 2020 at 17:02, Loic DESCOTTE <
loic.desco...@kaizen-solutions.net> wrote:

> So I've tried several other things, including building a fat jar with hdfs
> dependency inside my app jar, and added this to the Spark configuration in
> the code :
>
> val spark = SparkSession
>   .builder()
>   .appName("Hello Spark 7")
>   .config("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.
> DistributedFileSystem].getName)
>   .getOrCreate()
>
>
> But still the same error...
>
> --
> *De :* Sean Owen 
> *Envoyé :* mercredi 16 décembre 2020 14:27
> *À :* Loic DESCOTTE 
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> I think it'll have to be part of the Spark distro, but I'm not 100% sure.
> I also think these get registered via manifest files in the JARs; if some
> process is stripping those when creating a bundled up JAR, could be it.
> Could be that it's failing to initialize too for some reason.
>
> On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> I've tried with this spark-submit option :
>
> --packages
> org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \
>
> But it did't solve the issue.
> Should I add more jars?
>
> Thanks
> Loïc
> --
> *De :* Sean Owen 
> *Envoyé :* mercredi 16 décembre 2020 14:20
> *À :* Loic DESCOTTE 
> *Objet :* Re: Spark on Kubernetes : unable to write files to HDFS
>
> Seems like your Spark cluster doesn't somehow have the Hadoop JARs?
>
> On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE <
> loic.desco...@kaizen-solutions.net> wrote:
>
> Hello,
>
> I am using Spark On Kubernetes and I have the following error when I try
> to write data on HDFS : "no filesystem for scheme hdfs"
>
> More details :
>
> I am submitting my application with Spark submit like this :
>
> spark-submit --master k8s://https://myK8SMaster:6443 \
> --deploy-mode cluster \
> --name hello-spark \
> --class Hello \
> --conf spark.executor.instances=2 \
> --conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
> --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
> --conf spark.kubernetes.container.image=gradiant/spark:2.4.4
> hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar
>
> Then the driver and the 2 executors are created in K8S.
>
> But it fails when I look at the logs of the driver, I see this :
>
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> hfds
> at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
> at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at
> org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
> at
> org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
> at
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
> at Hello$.main(hello.scala:24)
> at Hello.main(hello.scala)
>
>
> As you can see , my application jar helloSpark.jar file is correctly
> loaded on HDFS by the Spark submit, but writing to HDFS fails.
>
> I have also tried to add the hadoop client dand hdfs dependencies in the
> spark submit command:
>
> --packages
> org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \
>
> But the error is still here.
>
>
> Here is the Scala code of my application :
>
>
> import java.util.Calendar
>
> import org.apache.spark.sql.SparkSession
>
> case class Data(singleField: String)
>
> object Hello
> {
> def main(args: Array[String])
> {
>
> val spark = SparkSession
>   .builder()
>   .appName("Hello Spark")
>   .getOrCreate()
>
> import spark.implicits._
>
> val now = Calendar.getInstance().getTime().toString
> val data = List(Data(now)).toDF()
>
> data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
> }
> }
>
> Thanks for your help,
> Loïc
>
>


RE: Spark on Kubernetes : unable to write files to HDFS

2020-12-16 Thread Loic DESCOTTE
So I've tried several other things, including building a fat jar with hdfs 
dependency inside my app jar, and added this to the Spark configuration in the 
code :

val spark = SparkSession
  .builder()
  .appName("Hello Spark 7")
  .config("fs.hdfs.impl", 
classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
  .getOrCreate()


But still the same error...


De : Sean Owen 
Envoyé : mercredi 16 décembre 2020 14:27
À : Loic DESCOTTE 
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

I think it'll have to be part of the Spark distro, but I'm not 100% sure. I 
also think these get registered via manifest files in the JARs; if some process 
is stripping those when creating a bundled up JAR, could be it. Could be that 
it's failing to initialize too for some reason.

On Wed, Dec 16, 2020 at 7:24 AM Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
I've tried with this spark-submit option :

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But it did't solve the issue.
Should I add more jars?

Thanks
Loïc

De : Sean Owen mailto:sro...@gmail.com>>
Envoyé : mercredi 16 décembre 2020 14:20
À : Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>>
Objet : Re: Spark on Kubernetes : unable to write files to HDFS

Seems like your Spark cluster doesn't somehow have the Hadoop JARs?

On Wed, Dec 16, 2020 at 6:45 AM Loic DESCOTTE 
mailto:loic.desco...@kaizen-solutions.net>> 
wrote:
Hello,

I am using Spark On Kubernetes and I have the following error when I try to 
write data on HDFS : "no filesystem for scheme hdfs"

More details :

I am submitting my application with Spark submit like this :

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 
hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this :

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on 
HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark 
submit command:

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
def main(args: Array[String])
{

val spark = SparkSession
  .builder()
  .appName("Hello Spark")
  .getOrCreate()

import spark.implicits._

val now = Calendar.getInstance().getTime().toString
val data = List(Data(now)).toDF()

data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
}
}

Thanks for your help,
Loïc


[no subject]

2020-12-16 Thread 张洪斌


Unsubscribe
发自网易邮箱大师

11

2020-12-16 Thread 张洪斌




发自网易邮箱大师

Re: unsubscribe

2020-12-16 Thread Jeff Evans
https://gist.github.com/jeff303/ba1906bb7bcb2f2501528a8bb1521b8e

On Wed, Dec 16, 2020, 6:45 AM 张洪斌  wrote:

> how to unsubscribe this ?
>
> 发自网易邮箱大师
> 在2020年12月16日 20:43,张洪斌
>  写道:
>
>
> unsubscribe
> 学生张洪斌
> 邮箱:hongbinzh...@163.com
>
> 
>
> 签名由 网易邮箱大师  定制
>
>


回复:unsubscribe

2020-12-16 Thread 张洪斌
how to unsubscribe this ?


发自网易邮箱大师
在2020年12月16日 20:43,张洪斌 写道:

unsubscribe
| |
学生张洪斌
|
|
邮箱:hongbinzh...@163.com
|

签名由 网易邮箱大师 定制

unsubscribe

2020-12-16 Thread 张洪斌

unsubscribe
| |
学生张洪斌
|
|
邮箱:hongbinzh...@163.com
|

签名由 网易邮箱大师 定制

Spark on Kubernetes : unable to write files to HDFS

2020-12-16 Thread Loic DESCOTTE
Hello,

I am using Spark On Kubernetes and I have the following error when I try to 
write data on HDFS : "no filesystem for scheme hdfs"

More details :

I am submitting my application with Spark submit like this :

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 
hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this :

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at 
org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)


As you can see , my application jar helloSpark.jar file is correctly loaded on 
HDFS by the Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark 
submit command:

--packages 
org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.


Here is the Scala code of my application :


import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
{
def main(args: Array[String])
{

val spark = SparkSession
  .builder()
  .appName("Hello Spark")
  .getOrCreate()

import spark.implicits._

val now = Calendar.getInstance().getTime().toString
val data = List(Data(now)).toDF()

data.write.mode("overwrite").format("text").save("hfds://hdfs-namenode/user/loic/result.txt")
}
}

Thanks for your help,
Loïc


copy_to() of dplyr lib returning previous date

2020-12-16 Thread Mayura
Hi
I am trying to convert date from string to date format using R with
as.date() as shown below:
dfTest <- data.frame(StringDate=c("2020-12-01","2020-12-02"),
DateDate=as.Date(c("2020-12-01","2020-12-02")))
dfTest
StringDate DateDate
1 2020-12-01 2020-12-01
2 2020-12-02 2020-12-02

The above command gives desired output:

but when i use copy_to() as shown below , previous date are returned which
is very strange:

sdfTest <- copy_to(sc, dfTest)
sdfTest
#Source: spark [?? x 2]
StringDate DateDate

1 2020-12-01 2020-11-30
2 2020-12-02 2020-12-01



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



copy_to() of sparklyr lib returning previous date

2020-12-16 Thread Mayura
Hi
I am trying to convert date from string to date format using R with
as.date() as shown below:
dfTest <- data.frame(StringDate=c("2020-12-01","2020-12-02"),
DateDate=as.Date(c("2020-12-01","2020-12-02")))
dfTest
StringDate DateDate
1 2020-12-01 2020-12-01
2 2020-12-02 2020-12-02

The above command gives desired output:

but when i use copy_to() as shown below , previous date are returned which
is very strange:

sdfTest <- copy_to(sc, dfTest)
sdfTest
#Source: spark [?? x 2]
StringDate DateDate

1 2020-12-01 2020-11-30
2 2020-12-02 2020-12-01



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Running spark code using wheel file

2020-12-16 Thread Sachit Murarka
Hi All,

I have created a wheel file and I am using the following command to run the
spark job:

spark-submit --py-files application.whl main_flow.py

My application is unable to reference the modules. Do I need to do the pip
install of the wheel first?

Kind Regards,
Sachit Murarka