You can do something like this:
ObjectListing objectListing;
do {
objectListing = s3Client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary :
objectListing.getObjectSummaries()) {
if ((objectSummary.getLastModified().compareTo(dayBefore) >
0) && (objectSummary.getLastModified().compareTo(dayAfter) <1) &&
objectSummary.getKey().contains(".log"))
FileNames.add(objectSummary.getKey());
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
String concatName= "";
for(String fName : FileNames) {
if(FileNames.indexOf(fName) == (FileNames.size() -1)) {
concatName+= "s3n://" + s3_bucket + "/" + fName;
} else {
concatName+= "s3n://" + s3_bucket + "/" + fName + ",";
}
}
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Loading-lots-of-parquet-files-into-dataframe-from-s3-tp23127p23394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org