[jira] [Commented] (SPARK-43188) Cannot write to Azure Datalake Gen2 (abfs/abfss) after Spark 3.1.2

2023-05-26 Thread Nicolas PHUNG (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17726667#comment-17726667
 ] 

Nicolas PHUNG commented on SPARK-43188:
---

Hello [~srowen]  I don't think so, but I manage to get it work Thanks to 
HADOOP-18707. It was a new default configuration in hadoop-azure that wasn't 
working for me anymore on local windows setup.

> Cannot write to Azure Datalake Gen2 (abfs/abfss) after Spark 3.1.2
> --
>
> Key: SPARK-43188
> URL: https://issues.apache.org/jira/browse/SPARK-43188
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Nicolas PHUNG
>Priority: Major
>
> Hello,
> I have an issue with Spark 3.3.2 & Spark 3.4.0 to write into Azure Data Lake 
> Storage Gen2 (abfs/abfss scheme). I've got the following errors:
> {code:java}
> warn 13:12:47.554: StdErr from Kernel Process 23/04/19 13:12:47 ERROR 
> FileFormatWriter: Aborting job 
> 6a75949c-1473-4445-b8ab-d125be3f0f21.org.apache.spark.SparkException: Job 
> aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent 
> failure: Lost task 1.0 in stage 0.0 (TID 1) (myhost executor driver): 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any 
> valid local directory for datablock-0001-    at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:462)
>     at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:165)
>     at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>     at 
> org.apache.hadoop.fs.store.DataBlocks$DiskBlockFactory.createTmpFileForWrite(DataBlocks.java:980)
>     at 
> org.apache.hadoop.fs.store.DataBlocks$DiskBlockFactory.create(DataBlocks.java:960)
>     at 
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.createBlockIfNeeded(AbfsOutputStream.java:262)
>     at 
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.(AbfsOutputStream.java:173)
>     at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:580)
>     at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:301)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)    at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)    at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>     at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:347)
>     at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:314)
>     at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:480)
>     at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
>     at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
>     at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
>     at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:490)
>     at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:161)
>     at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:146)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:389)
>     at 
> org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)    
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)    
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)    at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:328)    at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)    at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)    
> at org.apache.spark.scheduler.Task.run(Task.scala:139)    at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)    
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)    
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:    

[jira] [Commented] (SPARK-43188) Cannot write to Azure Datalake Gen2 (abfs/abfss) after Spark 3.1.2

2023-05-19 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724306#comment-17724306
 ] 

Sean R. Owen commented on SPARK-43188:
--

Looks like a local disk problem of some kind, not really a Spark issue

> Cannot write to Azure Datalake Gen2 (abfs/abfss) after Spark 3.1.2
> --
>
> Key: SPARK-43188
> URL: https://issues.apache.org/jira/browse/SPARK-43188
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Nicolas PHUNG
>Priority: Major
>
> Hello,
> I have an issue with Spark 3.3.2 & Spark 3.4.0 to write into Azure Data Lake 
> Storage Gen2 (abfs/abfss scheme). I've got the following errors:
> {code:java}
> warn 13:12:47.554: StdErr from Kernel Process 23/04/19 13:12:47 ERROR 
> FileFormatWriter: Aborting job 
> 6a75949c-1473-4445-b8ab-d125be3f0f21.org.apache.spark.SparkException: Job 
> aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent 
> failure: Lost task 1.0 in stage 0.0 (TID 1) (myhost executor driver): 
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any 
> valid local directory for datablock-0001-    at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:462)
>     at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:165)
>     at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
>     at 
> org.apache.hadoop.fs.store.DataBlocks$DiskBlockFactory.createTmpFileForWrite(DataBlocks.java:980)
>     at 
> org.apache.hadoop.fs.store.DataBlocks$DiskBlockFactory.create(DataBlocks.java:960)
>     at 
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.createBlockIfNeeded(AbfsOutputStream.java:262)
>     at 
> org.apache.hadoop.fs.azurebfs.services.AbfsOutputStream.(AbfsOutputStream.java:173)
>     at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystemStore.createFile(AzureBlobFileSystemStore.java:580)
>     at 
> org.apache.hadoop.fs.azurebfs.AzureBlobFileSystem.create(AzureBlobFileSystem.java:301)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195)    at 
> org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1175)    at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>     at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:347)
>     at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:314)
>     at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:480)
>     at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:420)
>     at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:409)
>     at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:36)
>     at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$$anon$1.newInstance(ParquetUtils.scala:490)
>     at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:161)
>     at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:146)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:389)
>     at 
> org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:888)    
> at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:888)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)    
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)    at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:328)    at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)    at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)    
> at org.apache.spark.scheduler.Task.run(Task.scala:139)    at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
>     at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)    
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)    
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Driver stacktrace:    at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785)
>     at 
>