Seems that Github branch-1.5 already changing the version to 1.5.1-SNAPSHOT,
I am a bit confused are we still on 1.5.0 RC3 or we are in 1.5.1 ? Chester On Mon, Aug 31, 2015 at 3:52 PM, Reynold Xin <r...@databricks.com> wrote: > I'm going to -1 the release myself since the issue @yhuai identified is > pretty serious. It basically OOMs the driver for reading any files with a > large number of partitions. Looks like the patch for that has already been > merged. > > I'm going to cut rc3 momentarily. > > > On Sun, Aug 30, 2015 at 11:30 AM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > >> +1 (non-binding) >> built from source and ran some jobs against YARN >> >> -Sandy >> >> On Sat, Aug 29, 2015 at 5:50 AM, vaquar khan <vaquar.k...@gmail.com> >> wrote: >> >>> >>> +1 (1.5.0 RC2)Compiled on Windows with YARN. >>> >>> Regards, >>> Vaquar khan >>> +1 (non-binding, of course) >>> >>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 42:36 min >>> mvn clean package -Pyarn -Phadoop-2.6 -DskipTests >>> 2. Tested pyspark, mllib >>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK >>> 2.2. Linear/Ridge/Laso Regression OK >>> 2.3. Decision Tree, Naive Bayes OK >>> 2.4. KMeans OK >>> Center And Scale OK >>> 2.5. RDD operations OK >>> State of the Union Texts - MapReduce, Filter,sortByKey (word count) >>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK >>> Model evaluation/optimization (rank, numIter, lambda) with >>> itertools OK >>> 3. Scala - MLlib >>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK >>> 3.2. LinearRegressionWithSGD OK >>> 3.3. Decision Tree OK >>> 3.4. KMeans OK >>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK >>> 3.6. saveAsParquetFile OK >>> 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, >>> registerTempTable, sql OK >>> 3.8. result = sqlContext.sql("SELECT >>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER >>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK >>> 4.0. Spark SQL from Python OK >>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") >>> OK >>> 5.0. Packages >>> 5.1. com.databricks.spark.csv - read/write OK >>> (--packages com.databricks:spark-csv_2.11:1.2.0-s_2.11 didn’t work. But >>> com.databricks:spark-csv_2.11:1.2.0 worked) >>> 6.0. DataFrames >>> 6.1. cast,dtypes OK >>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK >>> 6.3. joins,sql,set operations,udf OK >>> >>> Cheers >>> <k/> >>> >>> On Tue, Aug 25, 2015 at 9:28 PM, Reynold Xin <r...@databricks.com> >>> wrote: >>> >>>> Please vote on releasing the following candidate as Apache Spark >>>> version 1.5.0. The vote is open until Friday, Aug 29, 2015 at 5:00 UTC and >>>> passes if a majority of at least 3 +1 PMC votes are cast. >>>> >>>> [ ] +1 Release this package as Apache Spark 1.5.0 >>>> [ ] -1 Do not release this package because ... >>>> >>>> To learn more about Apache Spark, please see http://spark.apache.org/ >>>> >>>> >>>> The tag to be voted on is v1.5.0-rc2: >>>> >>>> https://github.com/apache/spark/tree/727771352855dbb780008c449a877f5aaa5fc27a >>>> >>>> The release files, including signatures, digests, etc. can be found at: >>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-bin/ >>>> >>>> Release artifacts are signed with the following key: >>>> https://people.apache.org/keys/committer/pwendell.asc >>>> >>>> The staging repository for this release (published as 1.5.0-rc2) can be >>>> found at: >>>> https://repository.apache.org/content/repositories/orgapachespark-1141/ >>>> >>>> The staging repository for this release (published as 1.5.0) can be >>>> found at: >>>> https://repository.apache.org/content/repositories/orgapachespark-1140/ >>>> >>>> The documentation corresponding to this release can be found at: >>>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc2-docs/ >>>> >>>> >>>> ======================================= >>>> How can I help test this release? >>>> ======================================= >>>> If you are a Spark user, you can help us test this release by taking an >>>> existing Spark workload and running on this release candidate, then >>>> reporting any regressions. >>>> >>>> >>>> ================================================ >>>> What justifies a -1 vote for this release? >>>> ================================================ >>>> This vote is happening towards the end of the 1.5 QA period, so -1 >>>> votes should only occur for significant regressions from 1.4. Bugs already >>>> present in 1.4, minor regressions, or bugs related to new features will not >>>> block this release. >>>> >>>> >>>> =============================================================== >>>> What should happen to JIRA tickets still targeting 1.5.0? >>>> =============================================================== >>>> 1. It is OK for documentation patches to target 1.5.0 and still go into >>>> branch-1.5, since documentations will be packaged separately from the >>>> release. >>>> 2. New features for non-alpha-modules should target 1.6+. >>>> 3. Non-blocker bug fixes should target 1.5.1 or 1.6.0, or drop the >>>> target version. >>>> >>>> >>>> ================================================== >>>> Major changes to help you focus your testing >>>> ================================================== >>>> >>>> As of today, Spark 1.5 contains more than 1000 commits from 220+ >>>> contributors. I've curated a list of important changes for 1.5. For the >>>> complete list, please refer to Apache JIRA changelog. >>>> >>>> RDD/DataFrame/SQL APIs >>>> >>>> - New UDAF interface >>>> - DataFrame hints for broadcast join >>>> - expr function for turning a SQL expression into DataFrame column >>>> - Improved support for NaN values >>>> - StructType now supports ordering >>>> - TimestampType precision is reduced to 1us >>>> - 100 new built-in expressions, including date/time, string, math >>>> - memory and local disk only checkpointing >>>> >>>> DataFrame/SQL Backend Execution >>>> >>>> - Code generation on by default >>>> - Improved join, aggregation, shuffle, sorting with cache friendly >>>> algorithms and external algorithms >>>> - Improved window function performance >>>> - Better metrics instrumentation and reporting for DF/SQL execution >>>> plans >>>> >>>> Data Sources, Hive, Hadoop, Mesos and Cluster Management >>>> >>>> - Dynamic allocation support in all resource managers (Mesos, YARN, >>>> Standalone) >>>> - Improved Mesos support (framework authentication, roles, dynamic >>>> allocation, constraints) >>>> - Improved YARN support (dynamic allocation with preferred locations) >>>> - Improved Hive support (metastore partition pruning, metastore >>>> connectivity to 0.13 to 1.2, internal Hive upgrade to 1.2) >>>> - Support persisting data in Hive compatible format in metastore >>>> - Support data partitioning for JSON data sources >>>> - Parquet improvements (upgrade to 1.7, predicate pushdown, faster >>>> metadata discovery and schema merging, support reading non-standard legacy >>>> Parquet files generated by other libraries) >>>> - Faster and more robust dynamic partition insert >>>> - DataSourceRegister interface for external data sources to specify >>>> short names >>>> >>>> SparkR >>>> >>>> - YARN cluster mode in R >>>> - GLMs with R formula, binomial/Gaussian families, and elastic-net >>>> regularization >>>> - Improved error messages >>>> - Aliases to make DataFrame functions more R-like >>>> >>>> Streaming >>>> >>>> - Backpressure for handling bursty input streams. >>>> - Improved Python support for streaming sources (Kafka offsets, >>>> Kinesis, MQTT, Flume) >>>> - Improved Python streaming machine learning algorithms (K-Means, >>>> linear regression, logistic regression) >>>> - Native reliable Kinesis stream support >>>> - Input metadata like Kafka offsets made visible in the batch details UI >>>> - Better load balancing and scheduling of receivers across cluster >>>> - Include streaming storage in web UI >>>> >>>> Machine Learning and Advanced Analytics >>>> >>>> - Feature transformers: CountVectorizer, Discrete Cosine >>>> transformation, MinMaxScaler, NGram, PCA, RFormula, StopWordsRemover, and >>>> VectorSlicer. >>>> - Estimators under pipeline APIs: naive Bayes, k-means, and isotonic >>>> regression. >>>> - Algorithms: multilayer perceptron classifier, PrefixSpan for >>>> sequential pattern mining, association rule generation, 1-sample >>>> Kolmogorov-Smirnov test. >>>> - Improvements to existing algorithms: LDA, trees/ensembles, GMMs >>>> - More efficient Pregel API implementation for GraphX >>>> - Model summary for linear and logistic regression. >>>> - Python API: distributed matrices, streaming k-means and linear >>>> models, LDA, power iteration clustering, etc. >>>> - Tuning and evaluation: train-validation split and multiclass >>>> classification evaluator. >>>> - Documentation: document the release version of public API methods >>>> >>>> >>> >> >