Ted, Thanks for the suggestions. Actually I tried both s3n and s3 and the result remains the same.
On Sun, Aug 23, 2015 at 12:27 PM, Ted Yu <yuzhih...@gmail.com> wrote: > In your case, I would specify "fs.s3.awsAccessKeyId" / > "fs.s3.awsSecretAccessKey" since you use s3 protocol. > > On Sun, Aug 23, 2015 at 11:03 AM, lostrain A < > donotlikeworkingh...@gmail.com> wrote: > >> Hi Ted, >> Thanks for the reply. I tried setting both of the keyid and accesskey >> via >> >> sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "***") >>> sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "**") >> >> >> However, the error still occurs for ORC format. >> >> If I change the format to JSON, although the error does not go, the JSON >> files can be saved successfully. >> >> >> >> >> On Sun, Aug 23, 2015 at 5:51 AM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> You may have seen this: >>> http://search-hadoop.com/m/q3RTtdSyM52urAyI >>> >>> >>> >>> On Aug 23, 2015, at 1:01 AM, lostrain A <donotlikeworkingh...@gmail.com> >>> wrote: >>> >>> Hi, >>> I'm trying to save a simple dataframe to S3 in ORC format. The code is >>> as follows: >>> >>> >>> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) >>>> import sqlContext.implicits._ >>>> val df=sc.parallelize(1 to 1000).toDF() >>>> df.write.format("orc").save("s3://logs/dummy) >>> >>> >>> I ran the above code in spark-shell and only the _SUCCESS file was saved >>> under the directory. >>> The last part of the spark-shell log said: >>> >>> 15/08/23 07:38:23 task-result-getter-1 INFO TaskSetManager: Finished >>>> task 95.0 in stage 2.0 (TID 295) in 801 ms on ip-*-*-*-*.ec2.internal >>>> (100/100) >>>> >>> >>> >>>> 15/08/23 07:38:23 dag-scheduler-event-loop INFO DAGScheduler: >>>> ResultStage 2 (save at <console>:29) finished in 0.834 s >>>> >>> >>> >>>> 15/08/23 07:38:23 task-result-getter-1 INFO YarnScheduler: Removed >>>> TaskSet 2.0, whose tasks have all completed, from pool >>>> >>> >>> >>>> 15/08/23 07:38:23 main INFO DAGScheduler: Job 2 finished: save at >>>> <console>:29, took 0.895912 s >>>> >>> >>> >>>> 15/08/23 07:38:24 main INFO >>>> LocalDirAllocator$AllocatorPerContext$DirSelector: Returning directory: >>>> /media/ephemeral0/s3/output- >>>> >>> >>> >>>> 15/08/23 07:38:24 main ERROR NativeS3FileSystem: md5Hash for >>>> dummy/_SUCCESS is [-44, 29, -128, -39, -113, 0, -78, >>>> 4, -23, -103, 9, -104, -20, -8, 66, 126] >>>> >>> >>> >>>> 15/08/23 07:38:24 main INFO DefaultWriterContainer: Job job_****_**** >>>> committed. >>> >>> >>> Anyone has experienced this before? >>> Thanks! >>> >>> >>> >> >