Dear group members, I'm trying to get a fresh start with Spark, but came a cross following issue;
I tried to read few CSV files from a folder, but the task got stuck and didn't complete. ( copied content from the terminal.) Can someone help to understand what is going wrong? Versions; java version "11.0.16" 2022-07-19 LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.16+11-LTS-199) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.16+11-LTS-199, mixed mode) Python 3.9.13 Windows 10 Copied from the terminal; ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.5.0 /_/ Using Python version 3.9.13 (main, Aug 25 2022 23:51:50) Spark context Web UI available at http://LK510FIDSLW4.ey.net:4041 Spark context available as 'sc' (master = local[*], app id = local-1697089858181). SparkSession available as 'spark'. >>> merged_spark_data = spark.read.csv(r"C:\Users\Kelum.Perera\Downloads\data-master\nyse_all\nyse_data\*", header=False ) Exception in thread "globPath-ForkJoinPool-1-worker-115" java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method) at org.apache.hadoop.io.nativeio.NativeIO$Windows.access(NativeIO.java:793) at org.apache.hadoop.fs.FileUtil.canRead(FileUtil.java:1249) at org.apache.hadoop.fs.FileUtil.list(FileUtil.java:1454) at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:601) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1972) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:2014) at org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:761) at org.apache.hadoop.fs.Globber.listStatus(Globber.java:128) at org.apache.hadoop.fs.Globber.doGlob(Globber.java:291) at org.apache.hadoop.fs.Globber.glob(Globber.java:202) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2124) at org.apache.spark.deploy.SparkHadoopUtil.globPath(SparkHadoopUtil.scala:238) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$3(DataSource.scala:737) at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:380) at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) at scala.util.Success.$anonfun$map$1(Try.scala:255) at scala.util.Success.map(Try.scala:213) at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) Noting happens afterwards. Appreciate your kind input to solve this. Best Regards, Kelum Perera