Jcenter / bintray support for spark packages?

2015-06-10 Thread Hector Yee
Hi Spark devs, Is it possible to add jcenter or bintray support for Spark packages? I'm trying to add our artifact which is on jcenter https://bintray.com/airbnb/aerosolve but I noticed in Spark packages it only accepts Maven coordinates. -- Yee Yang Li Hector google.com/+HectorYee

Re: Spark/Mesos

2015-05-05 Thread Hector Yee
Speaking as a user of spark on mesos Yes it appears that each app appears as a separate framework on the mesos master In fine grained mode the number of executors goes up and down vs fixed in coarse. I would not run fine grained mode on a large cluster as it can potentially spin up a lot of

Re: Storing large data for MLlib machine learning

2015-04-01 Thread Hector Yee
I use Thrift and then base64 encode the binary and save it as text file lines that are snappy or gzip encoded. It makes it very easy to copy small chunks locally and play with subsets of the data and not have dependencies on HDFS / hadoop for server stuff for example. On Thu, Mar 26, 2015 at

Re: over 10000 commits!

2015-03-06 Thread Hector Yee
Congrats! On Thu, Mar 5, 2015 at 1:34 PM, shane knapp skn...@berkeley.edu wrote: WOOT! On Thu, Mar 5, 2015 at 1:26 PM, Reynold Xin r...@databricks.com wrote: We reached a new milestone today. https://github.com/apache/spark 10,001 commits now. Congratulations to Xiangrui for

Re: [ANNOUNCE] Spark 1.2.0 Release Preview Posted

2014-11-20 Thread Hector Yee
I'm getting a lot of task lost with this build in a large mesos cluster. Happens with both hash and sort shuffles. 14/11/20 18:08:38 WARN TaskSetManager: Lost task 9.1 in stage 1.0 (TID 897, i-d4d6553a.inst.aws.airbnb.com): FetchFailed(null, shuffleId=1, mapId=-1, reduceId=9, message=

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
I'm still seeing the fetch failed error and updated https://issues.apache.org/jira/browse/SPARK-3633 On Thu, Nov 20, 2014 at 10:21 AM, Marcelo Vanzin van...@cloudera.com wrote: +1 (non-binding) . ran simple things on spark-shell . ran jobs in yarn client cluster modes, and standalone

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
I think it is a race condition caused by netty deactivating a channel while it is active. Switched to nio and it works fine --conf spark.shuffle.blockTransferService=nio On Thu, Nov 20, 2014 at 10:44 AM, Hector Yee hector@gmail.com wrote: I'm still seeing the fetch failed error and updated

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
This is whatever was in http://people.apache.org/~andrewor14/spark-1 .1.1-rc2/ On Thu, Nov 20, 2014 at 11:48 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hector, is this a comment on 1.1.1 or on the 1.2 preview? Matei On Nov 20, 2014, at 11:39 AM, Hector Yee hector@gmail.com wrote

Re: [VOTE] Release Apache Spark 1.1.1 (RC2)

2014-11-20 Thread Hector Yee
. But the spark.shuffle.blockTransferService property doesn't exist in 1.1 (AFAIK) -- what exactly are you doing to get this problem? Matei On Nov 20, 2014, at 11:50 AM, Hector Yee hector@gmail.com wrote: This is whatever was in http://people.apache.org/~andrewor14/spark-1 .1.1-rc2/ On Thu, Nov 20, 2014 at 11:48

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Hector Yee
I would say for bigdata applications the most useful would be hierarchical k-means with back tracking and the ability to support k nearest centroids. On Tue, Jul 8, 2014 at 10:54 AM, RJ Nowling rnowl...@gmail.com wrote: Hi all, MLlib currently has one clustering algorithm implementation,

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Hector Yee
. On Tue, Jul 8, 2014 at 1:01 PM, Hector Yee hector@gmail.com wrote: I would say for bigdata applications the most useful would be hierarchical k-means with back tracking and the ability to support k nearest centroids. On Tue, Jul 8, 2014 at 10:54 AM, RJ Nowling rnowl...@gmail.com

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Hector Yee
interesting problem here is choosing k at each level. Kernel methods seem to be most promising. On Tue, Jul 8, 2014 at 1:31 PM, Hector Yee hector@gmail.com wrote: No idea, never looked it up. Always just implemented it as doing k-means again on each cluster. FWIW standard k-means

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Hector Yee
thinking Hector? On Tue, Jul 8, 2014 at 4:50 PM, Dmitriy Lyubimov dlie...@gmail.com wrote: sure. more interesting problem here is choosing k at each level. Kernel methods seem to be most promising. On Tue, Jul 8, 2014 at 1:31 PM, Hector Yee hector@gmail.com wrote: No idea, never looked

Re: Contributing to MLlib: Proposal for Clustering Algorithms

2014-07-08 Thread Hector Yee
. On Tue, Jul 8, 2014 at 1:31 PM, Hector Yee hector@gmail.com wrote: No idea, never looked it up. Always just implemented it as doing k-means again on each cluster. FWIW standard k-means with euclidean distance has problems too with some dimensionality