n Thu, Jul 28, 2016 at 2:13 AM, Timothy Potter <thelabd...@gmail.com>
> wrote:
>>
>> I'm not looking for a one-off solution for a specific query that can
>> be solved on the client side as you suggest, but rather a generic
>> solution that can be implemented within the Da
ery? You can also cache it, of multiple queries on the
> same inner queries are requested.
>
>
> Il mercoledì 27 luglio 2016, Timothy Potter <thelabd...@gmail.com> ha
> scritto:
>>
>> Take this simple join:
>>
>> SELECT m.title as title, solr.aggCount
Take this simple join:
SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER
JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating
>= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON
solr.movie_id = m.movie_id ORDER BY aggCount DESC
I would like
I'm using the Spark Thrift server to execute SQL queries over JDBC.
I'm wondering if it's possible to plugin a class to do some
pre-processing on the SQL statement before it gets passed to the
SQLContext for actual execution? I scanned over the code and it
doesn't look like this is supported but I
FWIW - I synchronized access to the transformer and the problem went
away so this looks like some type of concurrent access issue when
dealing with UDFs
On Tue, Mar 29, 2016 at 9:19 AM, Timothy Potter <thelabd...@gmail.com> wrote:
> It's a local spark master, no cluster. I'm not sure
w me at https://twitter.com/jaceklaskowski
>
>
> On Mon, Mar 28, 2016 at 7:11 PM, Timothy Potter <thelabd...@gmail.com> wrote:
>> I'm seeing the following error when trying to generate a prediction
>> from a very simple ML pipeline based model. I've verified that the
I'm seeing the following error when trying to generate a prediction
from a very simple ML pipeline based model. I've verified that the raw
data sent to the tokenizer is valid (not null). It seems like this is
some sort of weird classpath or class loading type issue. Any help you
can provide in
I'm using Spark 1.4.1 and am doing the following with spark-shell:
solr = sqlContext.read.format("solr").option("zkhost",
"localhost:2181").option("collection","spark").load()
solr.select("id").count()
The Solr DataSource implements PrunedFilteredScan so I expected the
buildScan method to get
Forgot to mention that I've tested that SerIntWritable and
PipelineDocumentWritable are serializable by serializing /
deserializing to/from a byte array in memory.
On Wed, Oct 1, 2014 at 1:43 PM, Timothy Potter thelabd...@gmail.com wrote:
I'm running into the following deserialization issue when