subject:"Re\\\\\\\: Spark will process _temporary folder on S3 is very slow and always cause failure"

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-20 Thread Shuai Zheng

Thanks! Let me update the status. I have copied the DirectOutputCommitter to my local. And set: Conf.set(spark.hadoop.mapred.output.committer.class, org..DirectOutputCommitter) It works perfectly. Thanks everyone J Regards, Shuai From: Aaron Davidson

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-17 Thread Imran Rashid

I'm not super familiar w/ S3, but I think the issue is that you want to use a different output committers with object stores, that don't have a simple move operation. There have been a few other threads on S3 outputcommitters. I think the most relevant for you is most probably this open JIRA:

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-17 Thread Aaron Davidson

Actually, this is the more relevant JIRA (which is resolved): https://issues.apache.org/jira/browse/SPARK-3595 6352 is about saveAsParquetFile, which is not in use here. Here is a DirectOutputCommitter implementation: https://gist.github.com/aarondav/c513916e72101bbe14ec and it can be

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-16 Thread Akhil Das

If you use fileStream, there's an option to filter out files. In your case you can easily create a filter to remove _temporary files. In that case, you will have to move your codes inside foreachRDD of the dstream since the application will become a streaming app. Thanks Best Regards On Sat, Mar

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-13 Thread Shuai Zheng

And one thing forget to mention, even I have this exception and the result is not well format in my target folder (part of them are there, rest are under different folder structure of _tempoary folder). In the webUI of spark-shell, it is still be marked as successful step. I think this is a bug?

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

5 matches

Site Navigation

Mail list logo

Footer information