Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

Tom Graves Tue, 25 Aug 2015 14:00:21 -0700

Anyone using HiveContext with secure Hive with Spark 1.5 and have it working?
We have a non standard version of hive but was pulling our hive jars and its 
failing to authenticate.  It could be something in our hive version but 
wondering if spark isn't forwarding credentials properly.
Tom



     On Tuesday, August 25, 2015 1:56 PM, Tom Graves 
<tgraves...@yahoo.com.INVALID> wrote:
   

 Is there a jira to update the sql hive docs?Spark SQL and DataFrames - Spark 
1.5.0 Documentation

|   |
|   |   |   |   |   |
| Spark SQL and DataFrames - Spark 1.5.0 DocumentationSpark SQL and DataFrame 
Guide Overview DataFrames Starting Point: SQLContext Creating DataFrames 
DataFrame Operations Running SQL Queries Programmatically Interoperating with 
RDDs  |
|  |
| View on people.apache.org | Preview by Yahoo |
|  |
|   |


it still says default is 0.13.1 but pom file builds with hive 1.2.1-spark.
Tom 


     On Monday, August 24, 2015 4:06 PM, Sandy Ryza <sandy.r...@cloudera.com> 
wrote:
   

 I see that there's an 1.5.0-rc2 tag in github now.  Is that the official RC2 
tag to start trying out?
-Sandy
On Mon, Aug 24, 2015 at 7:23 AM, Sean Owen <so...@cloudera.com> wrote:

PS Shixiong Zhu is correct that this one has to be fixed:
https://issues.apache.org/jira/browse/SPARK-10168

For example you can see assemblies like this are nearly empty:
https://repository.apache.org/content/repositories/orgapachespark-1137/org/apache/spark/spark-streaming-flume-assembly_2.10/1.5.0-rc1/

Just a publishing glitch but worth a few more eyes on.

On Fri, Aug 21, 2015 at 5:28 PM, Sean Owen <so...@cloudera.com> wrote:
> Signatures, license, etc. look good. I'm getting some fairly
> consistent failures using Java 7 + Ubuntu 15 + "-Pyarn -Phive
> -Phive-thriftserver -Phadoop-2.6" -- does anyone else see these? they
> are likely just test problems, but worth asking. Stack traces are at
> the end.
>
> There are currently 79 issues targeted for 1.5.0, of which 19 are
> bugs, of which 1 is a blocker. (1032 have been resolved for 1.5.0.)
> That's significantly better than at the last release. I presume a lot
> of what's still targeted is not critical and can now be
> untargeted/retargeted.
>
> It occurs to me that the flurry of planning that took place at the
> start of the 1.5 QA cycle a few weeks ago was quite helpful, and is
> the kind of thing that would be even more useful at the start of a
> release cycle. So would be great to do this for 1.6 in a few weeks.
> Indeed there are already 267 issues targeted for 1.6.0 -- a decent
> roadmap already.
>
>
> Test failures:
>
> Core
>
> - Unpersisting TorrentBroadcast on executors and driver in distributed
> mode *** FAILED ***
>   java.util.concurrent.TimeoutException: Can't find 2 executors before
> 10000 milliseconds elapsed
>   at 
>org.apache.spark.ui.jobs.JobProgressListener.waitUntilExecutorsUp(JobProgressListener.scala:561)
>   at 
>org.apache.spark.broadcast.BroadcastSuite.testUnpersistBroadcast(BroadcastSuite.scala:313)
>   at 
>org.apache.spark.broadcast.BroadcastSuite.org$apache$spark$broadcast$BroadcastSuite$$testUnpersistTorrentBroadcast(BroadcastSuite.scala:287)
>   at 
>org.apache.spark.broadcast.BroadcastSuite$$anonfun$16.apply$mcV$sp(BroadcastSuite.scala:165)
>   at 
>org.apache.spark.broadcast.BroadcastSuite$$anonfun$16.apply(BroadcastSuite.scala:165)
>   at 
>org.apache.spark.broadcast.BroadcastSuite$$anonfun$16.apply(BroadcastSuite.scala:165)
>   at 
>org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   ...
>
> Streaming
>
> - stop slow receiver gracefully *** FAILED ***
>   0 was not greater than 0 (StreamingContextSuite.scala:324)
>
> Kafka
>
> - offset recovery *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 191
> times over 10.043196973 seconds. Last failure message:
> strings.forall({
>     ((elem: Any) => DirectKafkaStreamSuite.collectedData.contains(elem))
>   }) was false. (DirectKafkaStreamSuite.scala:249)
>
> On Fri, Aug 21, 2015 at 5:37 AM, Reynold Xin <r...@databricks.com> wrote:
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.5.0!
>>
>> The vote is open until Monday, Aug 17, 2015 at 20:00 UTC and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.5.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>>
>> The tag to be voted on is v1.5.0-rc1:
>> https://github.com/apache/spark/tree/4c56ad772637615cc1f4f88d619fac6c372c8552
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1137/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-1.5.0-rc1-docs/
>>
>>
>> =======================================
>> == How can I help test this release? ==
>> =======================================
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>>
>> ================================================
>> == What justifies a -1 vote for this release? ==
>> ================================================
>> This vote is happening towards the end of the 1.5 QA period, so -1 votes
>> should only occur for significant regressions from 1.4. Bugs already present
>> in 1.4, minor regressions, or bugs related to new features will not block
>> this release.
>>
>>
>> ===============================================================
>> == What should happen to JIRA tickets still targeting 1.5.0? ==
>> ===============================================================
>> 1. It is OK for documentation patches to target 1.5.0 and still go into
>> branch-1.5, since documentations will be packaged separately from the
>> release.
>> 2. New features for non-alpha-modules should target 1.6+.
>> 3. Non-blocker bug fixes should target 1.5.1 or 1.6.0, or drop the target
>> version.
>>
>>
>> ==================================================
>> == Major changes to help you focus your testing ==
>> ==================================================
>> As of today, Spark 1.5 contains more than 1000 commits from 220+
>> contributors. I've curated a list of important changes for 1.5. For the
>> complete list, please refer to Apache JIRA changelog.
>>
>> RDD/DataFrame/SQL APIs
>>
>> - New UDAF interface
>> - DataFrame hints for broadcast join
>> - expr function for turning a SQL expression into DataFrame column
>> - Improved support for NaN values
>> - StructType now supports ordering
>> - TimestampType precision is reduced to 1us
>> - 100 new built-in expressions, including date/time, string, math
>> - memory and local disk only checkpointing
>>
>> DataFrame/SQL Backend Execution
>>
>> - Code generation on by default
>> - Improved join, aggregation, shuffle, sorting with cache friendly
>> algorithms and external algorithms
>> - Improved window function performance
>> - Better metrics instrumentation and reporting for DF/SQL execution plans
>>
>> Data Sources, Hive, Hadoop, Mesos and Cluster Management
>>
>> - Dynamic allocation support in all resource managers (Mesos, YARN,
>> Standalone)
>> - Improved Mesos support (framework authentication, roles, dynamic
>> allocation, constraints)
>> - Improved YARN support (dynamic allocation with preferred locations)
>> - Improved Hive support (metastore partition pruning, metastore connectivity
>> to 0.13 to 1.2, internal Hive upgrade to 1.2)
>> - Support persisting data in Hive compatible format in metastore
>> - Support data partitioning for JSON data sources
>> - Parquet improvements (upgrade to 1.7, predicate pushdown, faster metadata
>> discovery and schema merging, support reading non-standard legacy Parquet
>> files generated by other libraries)
>> - Faster and more robust dynamic partition insert
>> - DataSourceRegister interface for external data sources to specify short
>> names
>>
>> SparkR
>>
>> - YARN cluster mode in R
>> - GLMs with R formula, binomial/Gaussian families, and elastic-net
>> regularization
>> - Improved error messages
>> - Aliases to make DataFrame functions more R-like
>>
>> Streaming
>>
>> - Backpressure for handling bursty input streams.
>> - Improved Python support for streaming sources (Kafka offsets, Kinesis,
>> MQTT, Flume)
>> - Improved Python streaming machine learning algorithms (K-Means, linear
>> regression, logistic regression)
>> - Native reliable Kinesis stream support
>> - Input metadata like Kafka offsets made visible in the batch details UI
>> - Better load balancing and scheduling of receivers across cluster
>> - Include streaming storage in web UI
>>
>> Machine Learning and Advanced Analytics
>>
>> - Feature transformers: CountVectorizer, Discrete Cosine transformation,
>> MinMaxScaler, NGram, PCA, RFormula, StopWordsRemover, and VectorSlicer.
>> - Estimators under pipeline APIs: naive Bayes, k-means, and isotonic
>> regression.
>> - Algorithms: multilayer perceptron classifier, PrefixSpan for sequential
>> pattern mining, association rule generation, 1-sample Kolmogorov-Smirnov
>> test.
>> - Improvements to existing algorithms: LDA, trees/ensembles, GMMs
>> - More efficient Pregel API implementation for GraphX
>> - Model summary for linear and logistic regression.
>> - Python API: distributed matrices, streaming k-means and linear models,
>> LDA, power iteration clustering, etc.
>> - Tuning and evaluation: train-validation split and multiclass
>> classification evaluator.
>> - Documentation: document the release version of public API methods
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.5.0 (RC1)

Reply via email to