Re: Possible to push sub-queries down into the DataSource impl?

2016-08-01 Thread Timothy Potter
n Thu, Jul 28, 2016 at 2:13 AM, Timothy Potter <thelabd...@gmail.com> > wrote: >> >> I'm not looking for a one-off solution for a specific query that can >> be solved on the client side as you suggest, but rather a generic >> solution that can be implemented within the Da

Re: Possible to push sub-queries down into the DataSource impl?

2016-07-27 Thread Timothy Potter
ery? You can also cache it, of multiple queries on the > same inner queries are requested. > > > Il mercoledì 27 luglio 2016, Timothy Potter <thelabd...@gmail.com> ha > scritto: >> >> Take this simple join: >> >> SELECT m.title as title, solr.aggCount

Possible to push sub-queries down into the DataSource impl?

2016-07-27 Thread Timothy Potter
Take this simple join: SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating >= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON solr.movie_id = m.movie_id ORDER BY aggCount DESC I would like

How to do some pre-processing of the SQL in the Thrift server?

2016-06-21 Thread Timothy Potter
I'm using the Spark Thrift server to execute SQL queries over JDBC. I'm wondering if it's possible to plugin a class to do some pre-processing on the SQL statement before it gets passed to the SQLContext for actual execution? I scanned over the code and it doesn't look like this is supported but I

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-29 Thread Timothy Potter
FWIW - I synchronized access to the transformer and the problem went away so this looks like some type of concurrent access issue when dealing with UDFs On Tue, Mar 29, 2016 at 9:19 AM, Timothy Potter <thelabd...@gmail.com> wrote: > It's a local spark master, no cluster. I'm not sure

Re: Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-29 Thread Timothy Potter
w me at https://twitter.com/jaceklaskowski > > > On Mon, Mar 28, 2016 at 7:11 PM, Timothy Potter <thelabd...@gmail.com> wrote: >> I'm seeing the following error when trying to generate a prediction >> from a very simple ML pipeline based model. I've verified that the

Strange ML pipeline errors from HashingTF using v1.6.1

2016-03-28 Thread Timothy Potter
I'm seeing the following error when trying to generate a prediction from a very simple ML pipeline based model. I've verified that the raw data sent to the tokenizer is valid (not null). It seems like this is some sort of weird classpath or class loading type issue. Any help you can provide in

selected field not getting pushed down into my DataSource?

2015-09-17 Thread Timothy Potter
I'm using Spark 1.4.1 and am doing the following with spark-shell: solr = sqlContext.read.format("solr").option("zkhost", "localhost:2181").option("collection","spark").load() solr.select("id").count() The Solr DataSource implements PrunedFilteredScan so I expected the buildScan method to get

Re: Task deserialization problem using 1.1.0 for Hadoop 2.4

2014-10-01 Thread Timothy Potter
Forgot to mention that I've tested that SerIntWritable and PipelineDocumentWritable are serializable by serializing / deserializing to/from a byte array in memory. On Wed, Oct 1, 2014 at 1:43 PM, Timothy Potter thelabd...@gmail.com wrote: I'm running into the following deserialization issue when