Hi, We have a spark application to parse log files and save to S3 in ORC format. However, during the foreachRDD operation we need to extract a date field to be able to determine the bucket location; we partition it by date. Currently, we just hardcode it by current date, but we have a requirement to determine it for each record.
Here's the current code. jsonRows.foreachRDD(r => { val parsedFormat = new SimpleDateFormat("yyyy-MM-dd/") val parsedDate = parsedFormat.format(new java.util.Date()) val OutputPath = destinationBucket + "/parsed_logs/orc/dt=" + parsedDate val jsonDf = sqlSession.read.schema(Schema.schema).json(r) val writer = jsonDf.write.mode("append").format("orc").option("compression", "zlib") if (environment.equals("local")) { writer.save("/tmp/sparrow") } else { writer.save(OutputPath) } }) The column in jsonRow that we want is `_ts`. Thanks.