Re: RFC: Remote "HBaseTest" from examples?

2016-08-18 Thread Ignacio Zendejas
I'm very late to this party and I get hbase-spark... what's the recommendation for pyspark + hbase? I realize this isn't necessarily a concern of the spark project, but it'd be nice to at least document it here with a very short and sweet response because I haven't found anything useful in the

createDataframe from s3 results in error

2015-06-02 Thread Ignacio Zendejas
I've run into an error when trying to create a dataframe. Here's the code: -- from pyspark import StorageLevel from pyspark.sql import Row table = 'blah' ssc = HiveContext(sc) data = sc.textFile('s3://bucket/some.tsv') def deserialize(s): p = s.strip().split('\t') p[-1] = float(p[-1])

Re: createDataframe from s3 results in error

2015-06-02 Thread Ignacio Zendejas
PM, Ignacio Zendejas i...@node.io wrote: I've run into an error when trying to create a dataframe. Here's the code: -- from pyspark import StorageLevel from pyspark.sql import Row table = 'blah' ssc = HiveContext(sc) data = sc.textFile('s3://bucket/some.tsv') def deserialize(s): p

Re: A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

2014-08-14 Thread Ignacio Zendejas
. stochastic gradient descent) that you can plug some functions into, without worrying about the communication. Matei On August 13, 2014 at 11:10:02 AM, Ignacio Zendejas ( ignacio.zendejas...@gmail.com) wrote: Has anyone had a chance to look at this paper (with title in subject)? http

A Comparison of Platforms for Implementing and Running Very Large Scale Machine Learning Algorithms

2014-08-13 Thread Ignacio Zendejas
Has anyone had a chance to look at this paper (with title in subject)? http://www.cs.rice.edu/~lp6/comparison.pdf Interesting that they chose to use Python alone. Do we know how much faster Scala is vs. Python in general, if at all? As with any and all benchmarks, I'm sure there are caveats, but

Re: feature selection and sparse vector support

2014-04-11 Thread Ignacio Zendejas
in branch-1.0 and master. You only need to provide an RDD of sparse vectors (created from Vectors.sparse). MLUtils.loadLibSVMData reads sparse features in LIBSVM format. Best, Xiangrui On Thu, Apr 10, 2014 at 5:18 PM, Ignacio Zendejas ignacio.zendejas...@gmail.com wrote: Hi

Re: feature selection and sparse vector support

2014-04-11 Thread Ignacio Zendejas
Here's the JIRA: https://issues.apache.org/jira/browse/SPARK-1473 Future discussions should take place in its comments section. Thanks. On Fri, Apr 11, 2014 at 11:26 AM, Ignacio Zendejas ignacio.zendejas...@gmail.com wrote: Thanks for the response, Xiangrui. And sounds good, Héctor

feature selection and sparse vector support

2014-04-10 Thread Ignacio Zendejas
Hi, again - As part of the next step, I'd like to make a more substantive contribution and propose some initial work on feature selection, primarily as it relates to text classification. Specifically, I'd like to contribute very straightforward code to perform information gain feature

Re: minor optimizations to get my feet wet

2014-04-10 Thread Ignacio Zendejas
with you that the old way is more readable (although less idiomatic scala). On Thu, Apr 10, 2014 at 1:48 PM, Ignacio Zendejas ignacio.zendejas...@gmail.com wrote: Hi, all - First off, I want to say that I love spark and am very excited about MLBase. I'd love to contribute now that I