Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
I've discovered that one of the anomalies I encountered was due to a (embarrassing? humorous?) user error. See the user list thread Failed RC-10 yarn-cluster job for FS closed error when cleaning up staging directory for my discussion. With the user error corrected, the FS closed exception only prevents deletion of the staging directory, but does not affect completion with SUCCESS. The FS closed exception still needs some investigation at least by me. I tried the patch reported by SPARK-1898, but it didn't fix the problem without fixing the user error. I did not attempt to test my fix without the patch, so I can't pass judgment on the patch. Although this is merely a pseudocluster based test -- I can't reconfigure our cluster with RC-10 -- I'll now change my vote to... +1. Thanks all who helped. Kevin On 05/21/2014 09:18 PM, Tom Graves wrote: I don't think Kevin's issue would be with an api change in YarnClientImpl since in both cases he says he is using hadoop 2.3.0. I'll take a look at his post in the user list. Tom On Wednesday, May 21, 2014 7:01 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi Kevin, Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it fixes your issue? Running in YARN cluster mode, I had a similar issue where Spark was able to create a Driver and an Executor via YARN, but then it stopped making any progress. Note: I was using a pre-release version of CDH5.1.0, not 2.3 like you were using. best, Colin On Wed, May 21, 2014 at 3:34 PM, Kevin Markey kevin.mar...@oracle.comwrote: 0 Abstaining because I'm not sure if my failures are due to Spark, configuration, or other factors... Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn documentation. No problems. Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache release). Updated scripts for various applications. Application had successfully compiled and run against Spark 0.9.1 and Hadoop 2.3.0. Ran in yarn-cluster mode. Application ran to conclusion except that it ultimately failed because of an exception when Spark tried to clean up the staging directory. Also, where before Yarn would report the running program as RUNNING, it only reported this application as ACCEPTED. It appeared to run two containers when the first instance never reported that it was RUNNING. I will post a separate note to the USER list about the specifics. Thanks Kevin Markey On 05/21/2014 10:58 AM, Mark Hamstra wrote: +1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.com wrote: Signature and hash for source looks good No external executable package with source - good Compiled with git and maven - good Ran examples and sample programs locally and standalone -good +1 - Henry On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob; f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb= d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
I retested several different cases... 1. FS closed exception shows up ONLY in RC-10, not in Spark 0.9.1, with both Hadoop 2.2 and 2.3. 2. SPARK-1898 has no effect for my use cases. 3. The failure to report that the underlying application is RUNNING and that it has succeeded is due ONLY to my user error. The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Thanks again. Kevin On 05/22/2014 01:32 AM, Kevin Markey wrote: I've discovered that one of the anomalies I encountered was due to a (embarrassing? humorous?) user error. See the user list thread Failed RC-10 yarn-cluster job for FS closed error when cleaning up staging directory for my discussion. With the user error corrected, the FS closed exception only prevents deletion of the staging directory, but does not affect completion with SUCCESS. The FS closed exception still needs some investigation at least by me. I tried the patch reported by SPARK-1898, but it didn't fix the problem without fixing the user error. I did not attempt to test my fix without the patch, so I can't pass judgment on the patch. Although this is merely a pseudocluster based test -- I can't reconfigure our cluster with RC-10 -- I'll now change my vote to... +1. Thanks all who helped. Kevin On 05/21/2014 09:18 PM, Tom Graves wrote: I don't think Kevin's issue would be with an api change in YarnClientImpl since in both cases he says he is using hadoop 2.3.0. I'll take a look at his post in the user list. Tom On Wednesday, May 21, 2014 7:01 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi Kevin, Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it fixes your issue? Running in YARN cluster mode, I had a similar issue where Spark was able to create a Driver and an Executor via YARN, but then it stopped making any progress. Note: I was using a pre-release version of CDH5.1.0, not 2.3 like you were using. best, Colin On Wed, May 21, 2014 at 3:34 PM, Kevin Markey kevin.mar...@oracle.comwrote: 0 Abstaining because I'm not sure if my failures are due to Spark, configuration, or other factors... Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn documentation. No problems. Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache release). Updated scripts for various applications. Application had successfully compiled and run against Spark 0.9.1 and Hadoop 2.3.0. Ran in yarn-cluster mode. Application ran to conclusion except that it ultimately failed because of an exception when Spark tried to clean up the staging directory. Also, where before Yarn would report the running program as RUNNING, it only reported this application as ACCEPTED. It appeared to run two containers when the first instance never reported that it was RUNNING. I will post a separate note to the USER list about the specifics. Thanks Kevin Markey On 05/21/2014 10:58 AM, Mark Hamstra wrote: +1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.com wrote: Signature and hash for source looks good No external executable package with source - good Compiled with git and maven - good Ran examples and sample programs locally and standalone -good +1 - Henry On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob; f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb= d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Without going and reading more of the Spark code, if your app is explicitly close()'ing the FileSystem instance, it may be causing the exception. If Spark is caching the FileSystem instance, your app is probably closing that same instance (which it got from the HDFS library's internal cache). It would be nice if you could test that theory; it might be worth knowing that's the case so that we can tell people not to do that. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options with FileSystem#get, you might get a reference to an FS object that already exists, from our FS cache cache singleton. Unfortunately, this also means that someone else can change the working directory on you or close the FS underneath you. The FS is basically shared mutable state, and you don't know whom you're sharing with. It might be better for Spark to call FileSystem#newInstance, which bypasses the FileSystem cache and always creates a new object. If Spark can hang on to the FS for a while, it can get the benefits of caching without the downsides. In HDFS, multiple FS instances can also share things like the socket cache between them. best, Colin On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin van...@cloudera.comwrote: Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Without going and reading more of the Spark code, if your app is explicitly close()'ing the FileSystem instance, it may be causing the exception. If Spark is caching the FileSystem instance, your app is probably closing that same instance (which it got from the HDFS library's internal cache). It would be nice if you could test that theory; it might be worth knowing that's the case so that we can tell people not to do that. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Prior to this fix, each Spark task created and cached its own FileSystems due to a bug in how the FS cache handles UGIs. The big problem that arose was that these FileSystems were never closed, so they just kept piling up. There were two solutions we considered, with the following effects: (1) Share the FS cache among all tasks and (2) Each task effectively gets its own FS cache, and closes all of its FSes after the task completes. We chose solution (1) for 3 reasons: - It does not rely on the behavior of a bug in HDFS. - It is the most performant option. - It is most consistent with the semantics of the (albeit broken) FS cache. Since this behavior was changed in 1.0, it could be considered a regression. We should consider the exact behavior we want out of the FS cache. For Spark's purposes, it seems fine to cache FileSystems across tasks, as Spark does not close FileSystems. The issue that comes up is that user code which uses FileSystem.get() but then closes the FileSystem can screw up Spark processes which were using that FileSystem. The workaround for users would be to use FileSystem.newInstance() if they want full control over the lifecycle of their FileSystems. On Thu, May 22, 2014 at 12:06 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options with FileSystem#get, you might get a reference to an FS object that already exists, from our FS cache cache singleton. Unfortunately, this also means that someone else can change the working directory on you or close the FS underneath you. The FS is basically shared mutable state, and you don't know whom you're sharing with. It might be better for Spark to call FileSystem#newInstance, which bypasses the FileSystem cache and always creates a new object. If Spark can hang on to the FS for a while, it can get the benefits of caching without the downsides. In HDFS, multiple FS instances can also share things like the socket cache between them. best, Colin On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin van...@cloudera.com wrote: Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Without going and reading more of the Spark code, if your app is explicitly close()'ing the FileSystem instance, it may be causing the exception. If Spark is caching the FileSystem instance, your app is probably closing that same instance (which it got from the HDFS library's internal cache). It would be nice if you could test that theory; it might be worth knowing that's the case so that we can tell people not to do that. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Thank you, all! This is quite helpful. We have been arguing how to handle this issue across a growing application. Unfortunately the Hadoop FileSystem java doc should say all this but doesn't! Kevin On 05/22/2014 01:48 PM, Aaron Davidson wrote: In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Prior to this fix, each Spark task created and cached its own FileSystems due to a bug in how the FS cache handles UGIs. The big problem that arose was that these FileSystems were never closed, so they just kept piling up. There were two solutions we considered, with the following effects: (1) Share the FS cache among all tasks and (2) Each task effectively gets its own FS cache, and closes all of its FSes after the task completes. We chose solution (1) for 3 reasons: - It does not rely on the behavior of a bug in HDFS. - It is the most performant option. - It is most consistent with the semantics of the (albeit broken) FS cache. Since this behavior was changed in 1.0, it could be considered a regression. We should consider the exact behavior we want out of the FS cache. For Spark's purposes, it seems fine to cache FileSystems across tasks, as Spark does not close FileSystems. The issue that comes up is that user code which uses FileSystem.get() but then closes the FileSystem can screw up Spark processes which were using that FileSystem. The workaround for users would be to use FileSystem.newInstance() if they want full control over the lifecycle of their FileSystems. On Thu, May 22, 2014 at 12:06 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options with FileSystem#get, you might get a reference to an FS object that already exists, from our FS cache cache singleton. Unfortunately, this also means that someone else can change the working directory on you or close the FS underneath you. The FS is basically shared mutable state, and you don't know whom you're sharing with. It might be better for Spark to call FileSystem#newInstance, which bypasses the FileSystem cache and always creates a new object. If Spark can hang on to the FS for a while, it can get the benefits of caching without the downsides. In HDFS, multiple FS instances can also share things like the socket cache between them. best, Colin On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin van...@cloudera.com wrote: Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Without going and reading more of the Spark code, if your app is explicitly close()'ing the FileSystem instance, it may be causing the exception. If Spark is caching the FileSystem instance, your app is probably closing that same instance (which it got from the HDFS library's internal cache). It would be nice if you could test that theory; it might be worth knowing that's the case so that we can tell people not to do that. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Hey all, On further testing, I came across a bug that breaks execution of pyspark scripts on YARN. https://issues.apache.org/jira/browse/SPARK-1900 This is a blocker and worth cutting a new RC. We also found a fix for a known issue that prevents additional jar files to be specified through spark-submit on YARN. https://issues.apache.org/jira/browse/SPARK-1870 The has been fixed and will be in the next RC. We are canceling this vote for now. We will post RC11 shortly. Thanks everyone for testing! TD On Thu, May 22, 2014 at 1:25 PM, Kevin Markey kevin.mar...@oracle.com wrote: Thank you, all! This is quite helpful. We have been arguing how to handle this issue across a growing application. Unfortunately the Hadoop FileSystem java doc should say all this but doesn't! Kevin On 05/22/2014 01:48 PM, Aaron Davidson wrote: In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Prior to this fix, each Spark task created and cached its own FileSystems due to a bug in how the FS cache handles UGIs. The big problem that arose was that these FileSystems were never closed, so they just kept piling up. There were two solutions we considered, with the following effects: (1) Share the FS cache among all tasks and (2) Each task effectively gets its own FS cache, and closes all of its FSes after the task completes. We chose solution (1) for 3 reasons: - It does not rely on the behavior of a bug in HDFS. - It is the most performant option. - It is most consistent with the semantics of the (albeit broken) FS cache. Since this behavior was changed in 1.0, it could be considered a regression. We should consider the exact behavior we want out of the FS cache. For Spark's purposes, it seems fine to cache FileSystems across tasks, as Spark does not close FileSystems. The issue that comes up is that user code which uses FileSystem.get() but then closes the FileSystem can screw up Spark processes which were using that FileSystem. The workaround for users would be to use FileSystem.newInstance() if they want full control over the lifecycle of their FileSystems. On Thu, May 22, 2014 at 12:06 PM, Colin McCabe cmcc...@alumni.cmu.eduwrote: The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options with FileSystem#get, you might get a reference to an FS object that already exists, from our FS cache cache singleton. Unfortunately, this also means that someone else can change the working directory on you or close the FS underneath you. The FS is basically shared mutable state, and you don't know whom you're sharing with. It might be better for Spark to call FileSystem#newInstance, which bypasses the FileSystem cache and always creates a new object. If Spark can hang on to the FS for a while, it can get the benefits of caching without the downsides. In HDFS, multiple FS instances can also share things like the socket cache between them. best, Colin On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin van...@cloudera.com wrote: Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Without going and reading more of the Spark code, if your app is explicitly close()'ing the FileSystem instance, it may be causing the exception. If Spark is caching the FileSystem instance, your app is probably closing that same instance (which it got from the HDFS library's internal cache). It would be nice if you could test that theory; it might be worth knowing that's the case so that we can tell people not to do that. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
On Thu, May 22, 2014 at 12:48 PM, Aaron Davidson ilike...@gmail.com wrote: In Spark 0.9.0 and 0.9.1, we stopped using the FileSystem cache correctly, and we just recently resumed using it in 1.0 (and in 0.9.2) when this issue was fixed: https://issues.apache.org/jira/browse/SPARK-1676 Interesting... Prior to this fix, each Spark task created and cached its own FileSystems due to a bug in how the FS cache handles UGIs. The big problem that arose was that these FileSystems were never closed, so they just kept piling up. There were two solutions we considered, with the following effects: (1) Share the FS cache among all tasks and (2) Each task effectively gets its own FS cache, and closes all of its FSes after the task completes. Since the FS cache is in hadoop-common-project, it's not so much a bug in HDFS as a bug in Hadoop. So even if you're using, say, Lustre, you'll still get the same issues with org.apache.hadoop.fs.FileSystem and its global cache. We chose solution (1) for 3 reasons: - It does not rely on the behavior of a bug in HDFS. - It is the most performant option. - It is most consistent with the semantics of the (albeit broken) FS cache. Since this behavior was changed in 1.0, it could be considered a regression. We should consider the exact behavior we want out of the FS cache. For Spark's purposes, it seems fine to cache FileSystems across tasks, as Spark does not close FileSystems. The issue that comes up is that user code which uses FileSystem.get() but then closes the FileSystem can screw up Spark processes which were using that FileSystem. The workaround for users would be to use FileSystem.newInstance() if they want full control over the lifecycle of their FileSystems. The current solution seems reasonable, as long as Spark processes: 1. don't change the current working directory (doing so isn't thread-safe and will affect all other users of that FS object) 2. don't close the FileSystem object Another solution would be to use newInstance and build your own FS cache, essentially. I don't think it would be that much code. This might be nicer because you could implement things like closing FileSystem objects that haven't been used in a while. cheers, Colin On Thu, May 22, 2014 at 12:06 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: The FileSystem cache is something that has caused a lot of pain over the years. Unfortunately we (in Hadoop core) can't change the way it works now because there are too many users depending on the current behavior. Basically, the idea is that when you request a FileSystem with certain options with FileSystem#get, you might get a reference to an FS object that already exists, from our FS cache cache singleton. Unfortunately, this also means that someone else can change the working directory on you or close the FS underneath you. The FS is basically shared mutable state, and you don't know whom you're sharing with. It might be better for Spark to call FileSystem#newInstance, which bypasses the FileSystem cache and always creates a new object. If Spark can hang on to the FS for a while, it can get the benefits of caching without the downsides. In HDFS, multiple FS instances can also share things like the socket cache between them. best, Colin On Thu, May 22, 2014 at 10:06 AM, Marcelo Vanzin van...@cloudera.com wrote: Hi Kevin, On Thu, May 22, 2014 at 9:49 AM, Kevin Markey kevin.mar...@oracle.com wrote: The FS closed exception only effects the cleanup of the staging directory, not the final success or failure. I've not yet tested the effect of changing my application's initialization, use, or closing of FileSystem. Without going and reading more of the Spark code, if your app is explicitly close()'ing the FileSystem instance, it may be causing the exception. If Spark is caching the FileSystem instance, your app is probably closing that same instance (which it got from the HDFS library's internal cache). It would be nice if you could test that theory; it might be worth knowing that's the case so that we can tell people not to do that. -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Looks like SPARK-1900 is a blocker for YARN and might as well add SPARK-1870 while at it. TD or Patrick, could you kindly send [CANCEL] prefixed in the subject email out for the RC10 Vote to help people follow the active VOTE threads? The VOTE emails are getting a bit hard to follow. - Henry On Thu, May 22, 2014 at 2:05 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Hey all, On further testing, I came across a bug that breaks execution of pyspark scripts on YARN. https://issues.apache.org/jira/browse/SPARK-1900 This is a blocker and worth cutting a new RC. We also found a fix for a known issue that prevents additional jar files to be specified through spark-submit on YARN. https://issues.apache.org/jira/browse/SPARK-1870 The has been fixed and will be in the next RC. We are canceling this vote for now. We will post RC11 shortly. Thanks everyone for testing! TD
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Right! Doing that. TD On Thu, May 22, 2014 at 3:07 PM, Henry Saputra henry.sapu...@gmail.com wrote: Looks like SPARK-1900 is a blocker for YARN and might as well add SPARK-1870 while at it. TD or Patrick, could you kindly send [CANCEL] prefixed in the subject email out for the RC10 Vote to help people follow the active VOTE threads? The VOTE emails are getting a bit hard to follow. - Henry On Thu, May 22, 2014 at 2:05 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Hey all, On further testing, I came across a bug that breaks execution of pyspark scripts on YARN. https://issues.apache.org/jira/browse/SPARK-1900 This is a blocker and worth cutting a new RC. We also found a fix for a known issue that prevents additional jar files to be specified through spark-submit on YARN. https://issues.apache.org/jira/browse/SPARK-1870 The has been fixed and will be in the next RC. We are canceling this vote for now. We will post RC11 shortly. Thanks everyone for testing! TD
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
+1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.comwrote: Signature and hash for source looks good No external executable package with source - good Compiled with git and maven - good Ran examples and sample programs locally and standalone -good +1 - Henry On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
0 Abstaining because I'm not sure if my failures are due to Spark, configuration, or other factors... Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn documentation. No problems. Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache release). Updated scripts for various applications. Application had successfully compiled and run against Spark 0.9.1 and Hadoop 2.3.0. Ran in yarn-cluster mode. Application ran to conclusion except that it ultimately failed because of an exception when Spark tried to clean up the staging directory. Also, where before Yarn would report the running program as RUNNING, it only reported this application as ACCEPTED. It appeared to run two containers when the first instance never reported that it was RUNNING. I will post a separate note to the USER list about the specifics. Thanks Kevin Markey On 05/21/2014 10:58 AM, Mark Hamstra wrote: +1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.comwrote: Signature and hash for source looks good No external executable package with source - good Compiled with git and maven - good Ran examples and sample programs locally and standalone -good +1 - Henry On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Hi Kevin, Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it fixes your issue? Running in YARN cluster mode, I had a similar issue where Spark was able to create a Driver and an Executor via YARN, but then it stopped making any progress. Note: I was using a pre-release version of CDH5.1.0, not 2.3 like you were using. best, Colin On Wed, May 21, 2014 at 3:34 PM, Kevin Markey kevin.mar...@oracle.comwrote: 0 Abstaining because I'm not sure if my failures are due to Spark, configuration, or other factors... Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn documentation. No problems. Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache release). Updated scripts for various applications. Application had successfully compiled and run against Spark 0.9.1 and Hadoop 2.3.0. Ran in yarn-cluster mode. Application ran to conclusion except that it ultimately failed because of an exception when Spark tried to clean up the staging directory. Also, where before Yarn would report the running program as RUNNING, it only reported this application as ACCEPTED. It appeared to run two containers when the first instance never reported that it was RUNNING. I will post a separate note to the USER list about the specifics. Thanks Kevin Markey On 05/21/2014 10:58 AM, Mark Hamstra wrote: +1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.com wrote: Signature and hash for source looks good No external executable package with source - good Compiled with git and maven - good Ran examples and sample programs locally and standalone -good +1 - Henry On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob; f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb= d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
Has anyone tried pyspark on yarn and got it to work? I was having issues when I built spark on redhat but when I built on my mac it had worked, but now when I build it on my mac it also doesn't work. Tom On Tuesday, May 20, 2014 3:14 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
I don't think Kevin's issue would be with an api change in YarnClientImpl since in both cases he says he is using hadoop 2.3.0. I'll take a look at his post in the user list. Tom On Wednesday, May 21, 2014 7:01 PM, Colin McCabe cmcc...@alumni.cmu.edu wrote: Hi Kevin, Can you try https://issues.apache.org/jira/browse/SPARK-1898 to see if it fixes your issue? Running in YARN cluster mode, I had a similar issue where Spark was able to create a Driver and an Executor via YARN, but then it stopped making any progress. Note: I was using a pre-release version of CDH5.1.0, not 2.3 like you were using. best, Colin On Wed, May 21, 2014 at 3:34 PM, Kevin Markey kevin.mar...@oracle.comwrote: 0 Abstaining because I'm not sure if my failures are due to Spark, configuration, or other factors... Compiled and deployed RC10 for YARN, Hadoop 2.3 per Spark 1.0.0 Yarn documentation. No problems. Rebuilt applications against RC10 and Hadoop 2.3.0 (plain vanilla Apache release). Updated scripts for various applications. Application had successfully compiled and run against Spark 0.9.1 and Hadoop 2.3.0. Ran in yarn-cluster mode. Application ran to conclusion except that it ultimately failed because of an exception when Spark tried to clean up the staging directory. Also, where before Yarn would report the running program as RUNNING, it only reported this application as ACCEPTED. It appeared to run two containers when the first instance never reported that it was RUNNING. I will post a separate note to the USER list about the specifics. Thanks Kevin Markey On 05/21/2014 10:58 AM, Mark Hamstra wrote: +1 On Tue, May 20, 2014 at 11:09 PM, Henry Saputra henry.sapu...@gmail.com wrote: Signature and hash for source looks good No external executable package with source - good Compiled with git and maven - good Ran examples and sample programs locally and standalone -good +1 - Henry On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob; f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb= d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
[VOTE] Release Apache Spark 1.0.0 (RC10)
Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
+1 On Tue, May 20, 2014 at 5:26 PM, Andrew Or and...@databricks.com wrote: +1 2014-05-20 13:13 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
+1 (non-binding) I have: - checked signatures and checksums of the files - built the code from the git repo using both sbt and mvn (against hadoop 2.3.0) - ran a few simple jobs in local, yarn-client and yarn-cluster mode Haven't explicitly tested any of the recent fixes, streaming nor sql. On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior -- Marcelo
Re: [VOTE] Release Apache Spark 1.0.0 (RC10)
+1 Tested it on both Windows and Mac OS X, with both Scala and Python. Confirmed that the issues in the previous RC were fixed. Matei On May 20, 2014, at 5:28 PM, Marcelo Vanzin van...@cloudera.com wrote: +1 (non-binding) I have: - checked signatures and checksums of the files - built the code from the git repo using both sbt and mvn (against hadoop 2.3.0) - ran a few simple jobs in local, yarn-client and yarn-cluster mode Haven't explicitly tested any of the recent fixes, streaming nor sql. On Tue, May 20, 2014 at 1:13 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.0! This has a few bug fixes on top of rc9: SPARK-1875: https://github.com/apache/spark/pull/824 SPARK-1876: https://github.com/apache/spark/pull/819 SPARK-1878: https://github.com/apache/spark/pull/822 SPARK-1879: https://github.com/apache/spark/pull/823 The tag to be voted on is v1.0.0-rc10 (commit d8070234): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=d807023479ce10aec28ef3c1ab646ddefc2e663c The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10/ The release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1018/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/ The full list of changes in this release can be found at: https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=CHANGES.txt;h=d21f0ace6326e099360975002797eb7cba9d5273;hb=d807023479ce10aec28ef3c1ab646ddefc2e663c Please vote on releasing this package as Apache Spark 1.0.0! The vote is open until Friday, May 23, at 20:00 UTC and passes if amajority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == API Changes == We welcome users to compile Spark applications against 1.0. There are a few API changes in this release. Here are links to the associated upgrade guides - user facing changes have been kept as small as possible. Changes to ML vector specification: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/mllib-guide.html#from-09-to-10 Changes to the Java API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/java-programming-guide.html#upgrading-from-pre-10-versions-of-spark Changes to the streaming API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/streaming-programming-guide.html#migration-guide-from-091-or-below-to-1x Changes to the GraphX API: http://people.apache.org/~tdas/spark-1.0.0-rc10-docs/graphx-programming-guide.html#upgrade-guide-from-spark-091 Other changes: coGroup and related functions now return Iterable[T] instead of Seq[T] == Call toSeq on the result to restore the old behavior SparkContext.jarOfClass returns Option[String] instead of Seq[String] == Call toSeq on the result to restore old behavior -- Marcelo