[ https://issues.apache.org/jira/browse/HADOOP-17984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-17984. ------------------------------------- Resolution: Invalid > Hadoop-aws jar is unable to read file from S3 if used with third party like > MINIO > --------------------------------------------------------------------------------- > > Key: HADOOP-17984 > URL: https://issues.apache.org/jira/browse/HADOOP-17984 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 > Affects Versions: 3.2.0 > Reporter: Naresh > Priority: Minor > > Unable to read a file from S3 from spark if end point url is pointing to > MINIO within EKS kubernetes cluster. We are able to do read/write from other > clients and minio console. But when we read using spark I see empty data > frame coming. If I use dataframe.show() it displays like below. > > ++ > > ++ > ++ > > *Spark Config:* > .config("spark.hadoop.fs.s3a.endpoint", "http://127.0.0.1:9000") // minio url > or port-forward to local > .config("spark.hadoop.fs.s3a.access.key",<myaccesskey>) > .config("spark.hadoop.fs.s3a.secret.key",<mysecretkey>) > > "spark.hadoop.fs.s3a.secret.key" > "spark.hadoop.fs.s3a.secret.key" > .config("spark.hadoop.fs.s3a.path.style.access", *true*) > .config("spark.hadoop.fs.s3a.impl", > "org.apache.hadoop.fs.s3a.S3AFileSystem") > > .config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2") > .config("fs.s3a.committer.staging.conflict-mode", "replace") > .config("fs.s3a.committer.name", "file") > .config("fs.s3a.committer.threads", "20") > .config("fs.s3a.threads.max", "20") > .config("fs.s3a.fast.upload.buffer", "bytebuffer") > .config("fs.s3a.fast.upload.active.blocks", "8") > .config("fs.s3a.block.size", "128M") > .config("mapred.input.dir.recursive","true") > .config("spark.sql.parquet.binaryAsString", "true") > > > *JAR files:* > hadoop-aws:3.2.0 > aws-java-sdk:1.12.30 > spark-core_2.12:3.1.2 > spark-sql_2.12:3.1.2 > > *Logs:* > DEBUG S3AFileSystem:2121: Getting path status for > s3a://<mybucket>/<myfolder>/2021/test1_2021-03-23_15_21_31.592.csv > (2021/test1_2021-03-23_15_21_31.592.csv) > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: object_metadata_requests += > 1 -> 1 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2189: Found exact file: normal file > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_exists += 1 -> 1 > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_get_file_status += 1 -> > 2 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2121: Getting path status for > s3a://mybbucket/myfolder/test1_2021-03-23_15_21_31.592.csv > (2021/test1_2021-03-23_15_21_31.592.csv) > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: object_metadata_requests += > 1 -> 2 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2189: Found exact file: normal file > 21/10/28 16:52:34 DEBUG S3AFileSystem:1899: List status for path: > s3a://mybbucket/myfolder/test1_2021-03-23_15_21_31.592.csv > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_list_status += 1 -> 1 > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_get_file_status += 1 -> > 3 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2121: Getting path status for > s3a://mybbucket/myfolder//test1_2021-03-23_15_21_31.592.csv > (2021/test1_2021-03-23_15_21_31.592.csv) > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: object_metadata_requests += > 1 -> 3 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2189: Found exact file: normal file > 21/10/28 16:52:34 DEBUG S3AFileSystem:1930: Adding: rd (not a dir): > s3a://mybbucket/myfolder//test1_2021-03-23_15_21_31.592.csv > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_is_directory += 1 -> 2 > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_get_file_status += 1 -> > 4 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2121: Getting path status for > s3a://mybbucket/myfolder//test1_2021-03-23_15_21_31.592.csv > (2021/test1_2021-03-23_15_21_31.592.csv) > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: object_metadata_requests += > 1 -> 4 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2189: Found exact file: normal file > 21/10/28 16:52:34 DEBUG S3AFileSystem:1899: List status for path: > s3a://mybbucket/myfolder//test1_2021-03-23_15_21_31.592.csv > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_list_status += 1 -> 2 > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: op_get_file_status += 1 -> > 5 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2121: Getting path status for > s3a://mybbucket/myfolder/test1_2021-03-23_15_21_31.592.csv > (2021/test1_2021-03-23_15_21_31.592.csv) > 21/10/28 16:52:34 DEBUG S3AStorageStatistics:63: object_metadata_requests += > 1 -> 5 > 21/10/28 16:52:34 DEBUG S3AFileSystem:2189: Found exact file: normal file > > ++ > || > ++ > ++ -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org