Re: Remove / update version in spark-packages.org

2016-07-26 Thread Julio Antonio Soto de Vicente
lease. > > Best regards, > Burak > >> On Tue, Jul 26, 2016 at 3:51 AM, Julio Antonio Soto de Vicente >> <ju...@esbet.es> wrote: >> Hi all, >> >> Maybe I am missing something, but... Is there a way to update a package >> uploaded to spark-packages

Remove / update version in spark-packages.org

2016-07-26 Thread Julio Antonio Soto de Vicente
Hi all, Maybe I am missing something, but... Is there a way to update a package uploaded to spark-packages.org under the same version? Given a release called my_package 1.1.2, I would like to re-upload it due to build failure; but I want to do it also as version 1.1.2... Thank you.

Re: ImportError: No module named numpy

2016-06-01 Thread Julio Antonio Soto de Vicente
Try adding to spark-env.sh (renaming if you still have it with .template at the end): PYSPARK_PYTHON=/path/to/your/bin/python Where your bin/python is your actual Python environment with Numpy installed. > El 1 jun 2016, a las 20:16, Bhupendra Mishra > escribió:

Re: Cross Validator to work with K-Fold value of 1?

2016-05-02 Thread Julio Antonio Soto de Vicente
Hi, Same goes for the PolynomialExpansion in org.apache.spark.ml.feature. It would be dice to cross-validate with degree 1 polynomial expansion (this is, with no expansion at all) vs other degree polynomial expansions. Unfortunately, degree is forced to be >= 2. -- Julio > El 2 may 2016, a

Re: Spark 1.6.0 + Hive + HBase

2016-01-28 Thread Julio Antonio Soto de Vicente
Hi, Indeed, Hive is not able to perform predicate pushdown through a HBase table. Nor Hive or Impala can. Broadly speaking, if you need to query your HBase table through a field other than de rowkey: A) Try to "encode" as much info as possible in the rowkey field and use it as your

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Julio Antonio Soto de Vicente
Unfortunately, Koert is right. I've been in a couple of projects using Spark (banking industry) where CentOS + Python 2.6 is the toolbox available. That said, I believe it should not be a concern for Spark. Python 2.6 is old and busted, which is totally opposite to the Spark philosophy IMO.

Re: what should I know to implement twitter streaming for pyspark?

2015-11-24 Thread Julio Antonio Soto de Vicente
Hi Amir, I believe that the first step should be looking for a library that implements the streaming API. > El 24/11/2015, a las 10:32, Amir Rahnama escribió: > > I wanna end the situation where python users of spark need to implement the > twitter source for

Re: Maintaining overall cumulative data in Spark Streaming

2015-10-29 Thread Julio Antonio Soto de Vicente
-dev +user Hi Sandeep, Perhaps (flat)mapping values and using an accumulator? > El 29/10/2015, a las 23:08, Sandeep Giri escribió: > > Dear All, > > If a continuous stream of text is coming in and you have to keep publishing > the overall word count so far since