Krishna, could you send me some code snippets for the issues you saw
in naive Bayes and k-means? -Xiangrui

On Sun, Nov 30, 2014 at 6:49 AM, Krishna Sankar <ksanka...@gmail.com> wrote:
> +1
> 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4
> -Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly slower
> connection)
> 2. Tested pyspark, mlib - running as well as compare esults with 1.1.x
> 2.1. statistics OK
> 2.2. Linear/Ridge/Laso Regression OK
>        Slight difference in the print method (vs. 1.1.x) of the model
> object - with a label & more details. This is good.
> 2.3. Decision Tree, Naive Bayes OK
>        Changes in print(model) - now print (model.ToDebugString()) - OK
>        Some changes in NaiveBayes. Different from my 1.1.x code - had to
> flatten list structures, zip required same number in partitions
>        After code changes ran fine.
> 2.4. KMeans OK
>        zip occasionally fails with error "localhost):
> org.apache.spark.SparkException: Can only zip RDDs with same number of
> elements in each partition"
> Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ?
> Made it work by doing a different transformation ie reusing an original
> rdd.
> 2.5. rdd operations OK
>        State of the Union Texts - MapReduce, Filter,sortByKey (word count)
> 2.6. recommendation OK
> 2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium
> dataset which never worked. Works fine in 1.2.0 !
> 3. Scala Mlib - subset of examples as in #2 above, with Scala
> 3.1. statistics OK
> 3.2. Linear Regression OK
> 3.3. Decision Tree OK
> 3.4. KMeans OK
> Cheers
> <k/>
> P.S: Plan to add RF and .ml mechanics to this bank
>
> On Fri, Nov 28, 2014 at 9:16 PM, Patrick Wendell <pwend...@gmail.com> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 1.2.0!
>>
>> The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1):
>>
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-1.2.0-rc1/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1048/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.2.0!
>>
>> The vote is open until Tuesday, December 02, at 05:15 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.1.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> == What justifies a -1 vote for this release? ==
>> This vote is happening very late into the QA period compared with
>> previous votes, so -1 votes should only occur for significant
>> regressions from 1.0.2. Bugs already present in 1.1.X, minor
>> regressions, or bugs related to new features will not block this
>> release.
>>
>> == What default changes should I be aware of? ==
>> 1. The default value of "spark.shuffle.blockTransferService" has been
>> changed to "netty"
>> --> Old behavior can be restored by switching to "nio"
>>
>> 2. The default value of "spark.shuffle.manager" has been changed to "sort".
>> --> Old behavior can be restored by setting "spark.shuffle.manager" to
>> "hash".
>>
>> == Other notes ==
>> Because this vote is occurring over a weekend, I will likely extend
>> the vote if this RC survives until the end of the vote period.
>>
>> - Patrick
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to