I'm very late to this party and I get hbase-spark... what's the
recommendation for pyspark + hbase? I realize this isn't necessarily a
concern of the spark project, but it'd be nice to at least document it here
with a very short and sweet response because I haven't found anything
useful in the
I've run into an error when trying to create a dataframe. Here's the code:
--
from pyspark import StorageLevel
from pyspark.sql import Row
table = 'blah'
ssc = HiveContext(sc)
data = sc.textFile('s3://bucket/some.tsv')
def deserialize(s):
p = s.strip().split('\t')
p[-1] = float(p[-1])
PM, Ignacio Zendejas i...@node.io wrote:
I've run into an error when trying to create a dataframe. Here's the code:
--
from pyspark import StorageLevel
from pyspark.sql import Row
table = 'blah'
ssc = HiveContext(sc)
data = sc.textFile('s3://bucket/some.tsv')
def deserialize(s):
p
. stochastic
gradient descent) that you can plug some functions into, without worrying
about the communication.
Matei
On August 13, 2014 at 11:10:02 AM, Ignacio Zendejas (
ignacio.zendejas...@gmail.com) wrote:
Has anyone had a chance to look at this paper (with title in subject)?
http
Has anyone had a chance to look at this paper (with title in subject)?
http://www.cs.rice.edu/~lp6/comparison.pdf
Interesting that they chose to use Python alone. Do we know how much faster
Scala is vs. Python in general, if at all?
As with any and all benchmarks, I'm sure there are caveats, but
in
branch-1.0 and master. You only need to provide an RDD of sparse
vectors (created from Vectors.sparse).
MLUtils.loadLibSVMData reads sparse features in LIBSVM format.
Best,
Xiangrui
On Thu, Apr 10, 2014 at 5:18 PM, Ignacio Zendejas
ignacio.zendejas...@gmail.com wrote:
Hi
Here's the JIRA:
https://issues.apache.org/jira/browse/SPARK-1473
Future discussions should take place in its comments section.
Thanks.
On Fri, Apr 11, 2014 at 11:26 AM, Ignacio Zendejas
ignacio.zendejas...@gmail.com wrote:
Thanks for the response, Xiangrui.
And sounds good, Héctor
Hi, again -
As part of the next step, I'd like to make a more substantive contribution
and propose some initial work on feature selection, primarily as it relates
to text classification.
Specifically, I'd like to contribute very straightforward code to perform
information gain feature
with you that the old way is more readable (although
less idiomatic scala).
On Thu, Apr 10, 2014 at 1:48 PM, Ignacio Zendejas
ignacio.zendejas...@gmail.com wrote:
Hi, all -
First off, I want to say that I love spark and am very excited about
MLBase. I'd love to contribute now that I