Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Kiran Chitturi
Mich,

I am curious as well on how Spark casts between different types of filters.

For example: the conversions happen implicitly for 'EqualTo' filter

scala> sqlContext.sql("SELECT * from events WHERE `registration` =
> '2015-05-28'").explain()
>
> 16/04/15 11:44:15 INFO ParseDriver: Parsing command: SELECT * from events
> WHERE `registration` = '2015-05-28'
>
> 16/04/15 11:44:15 INFO ParseDriver: Parse Completed
>
> 16/04/15 11:44:15 INFO SolrRelation: Constructed SolrQuery:
> q=*:*=1000=demo=registration:"2015-05-28T07:00:00.000Z"
>
> == Physical Plan ==
>
> Filter (registration#17 = 14327964)
> +- Scan 
> com.lucidworks.spark.SolrRelation@483f70bf[method#0,author#1,location#2,song#3,timestamp#4,auth#5,page#6,userAgent#7,lastName#8,firstName#9,id#10,itemInSession#11L,artist#12,status#13L,sessionId#14L,length#15,_version_#16L,registration#17,userId#18L,title#19,gender#20,level#21]
> PushedFilters: [EqualTo(registration,2015-05-28 00:00:00.0)]


whereas for other filters 'GreaterThan', 'GreatherThanOrEqual', 'LessThan',
'LessThanOREqual', the values are casted to String.

scala> sqlContext.sql("SELECT * from events WHERE `registration` >=
> '2015-05-28 00.00.00'").explain()
> 16/04/15 11:45:13 INFO ParseDriver: Parsing command: SELECT * from events
> WHERE `registration` >= '2015-05-28 00.00.00'
> 16/04/15 11:45:13 INFO ParseDriver: Parse Completed
> == Physical Plan ==
> Filter (cast(registration#17 as string) >= 2015-05-28 00.00.00)
> +- Scan com.lucidworks.spark.SolrRelation@483f70bf
> [method#0,author#1,location#2,song#3,timestamp#4,auth#5,page#6,userAgent#7,lastName#8,firstName#9,id#10,itemInSession#11L,artist#12,status#13L,sessionId#14L,length#15,_version_#16L,registration#17,userId#18L,title#19,gender#20,level#21]



How does Hive work with date/timestamp queries ? Can you share any docs
regarding this ?

I have found this in Hive docs
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select#LanguageManualSelect-PartitionBasedQueries
.

SELECT page_views.*
> FROM page_views
> WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31'


In the above SQL query, is there any implicit casting to String or does
Hive try to convert it to a Date/Timestamp and then do the comparison ?

 Thanks,

On Fri, Apr 15, 2016 at 1:53 AM, Mich Talebzadeh 
wrote:

> Thanks Takeshi,
>
> I did check it. I believe you are referring to this statement
>
> "This is likely because we cast this expression weirdly to be compatible
> with Hive. Specifically I think this turns into, CAST(c_date AS STRING)
> >= "2016-01-01", and we don't push down casts down into data sources.
>
> The reason for casting this way is because users currently expect the
> following to work c_date >= "2016". "
>
> There are two issues here:
>
>1. The CAST expression is not pushed down
>2. I still don't trust the string comparison of dates. It may or may
>not work. I recall this as an issue in Hive
>
> '2012-11-23' is not a DATE; It is a string. In general from my
> experience one should not try to compare a DATE with a string. The results
> will depend on several factors, some related to the tool, some related to
> the session.
>
> For example the following will convert a date as string format into a DATE
> type in Hive
>
>
> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(TransactionDate,'dd/MM/'),'-MM-dd'))
>
> HTH
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 April 2016 at 06:58, Takeshi Yamamuro  wrote:
>
>> Hi, Mich
>>
>> Did you check the URL Josh referred to?;
>> the cast for string comparisons is needed for accepting `c_date >=
>> "2016"`.
>>
>> // maropu
>>
>>
>> On Fri, Apr 15, 2016 at 10:30 AM, Hyukjin Kwon 
>> wrote:
>>
>>> Hi,
>>>
>>>
>>> String comparison itself is pushed down fine but the problem is to deal
>>> with Cast.
>>>
>>>
>>> It was pushed down before but is was reverted, (
>>> https://github.com/apache/spark/pull/8049).
>>>
>>> Several fixes were tried here,
>>> https://github.com/apache/spark/pull/11005 and etc. but there were no
>>> changes to make it.
>>>
>>>
>>> To cut it short, it is not being pushed down because it is unsafe to
>>> resolve cast (eg. long to integer)
>>>
>>> For an workaround,  the implementation of Solr data source should be
>>> changed to one with CatalystScan, which take all the filters.
>>>
>>> But CatalystScan is not designed to be binary compatible across
>>> releases, however it looks some think it is stable now, as mentioned here,
>>> https://github.com/apache/spark/pull/10750#issuecomment-175400704.
>>>
>>>
>>> Thanks!
>>>
>>>
>>> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh :
>>>
 Hi Josh,

 Can you please clarify whether 

Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Kiran Chitturi
Thanks Hyukjin for the suggestion. I will take a look at implementing Solr
datasource with CatalystScan.


​


Re: Spark sql not pushing down timestamp range queries

2016-04-15 Thread Mich Talebzadeh
Thanks Takeshi,

I did check it. I believe you are referring to this statement

"This is likely because we cast this expression weirdly to be compatible
with Hive. Specifically I think this turns into, CAST(c_date AS STRING) >=
"2016-01-01", and we don't push down casts down into data sources.

The reason for casting this way is because users currently expect the
following to work c_date >= "2016". "

There are two issues here:

   1. The CAST expression is not pushed down
   2. I still don't trust the string comparison of dates. It may or may not
   work. I recall this as an issue in Hive

'2012-11-23' is not a DATE; It is a string. In general from my
experience one should not try to compare a DATE with a string. The results
will depend on several factors, some related to the tool, some related to
the session.

For example the following will convert a date as string format into a DATE
type in Hive

TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(TransactionDate,'dd/MM/'),'-MM-dd'))

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 15 April 2016 at 06:58, Takeshi Yamamuro  wrote:

> Hi, Mich
>
> Did you check the URL Josh referred to?;
> the cast for string comparisons is needed for accepting `c_date >= "2016"`.
>
> // maropu
>
>
> On Fri, Apr 15, 2016 at 10:30 AM, Hyukjin Kwon 
> wrote:
>
>> Hi,
>>
>>
>> String comparison itself is pushed down fine but the problem is to deal
>> with Cast.
>>
>>
>> It was pushed down before but is was reverted, (
>> https://github.com/apache/spark/pull/8049).
>>
>> Several fixes were tried here, https://github.com/apache/spark/pull/11005
>> and etc. but there were no changes to make it.
>>
>>
>> To cut it short, it is not being pushed down because it is unsafe to
>> resolve cast (eg. long to integer)
>>
>> For an workaround,  the implementation of Solr data source should be
>> changed to one with CatalystScan, which take all the filters.
>>
>> But CatalystScan is not designed to be binary compatible across releases,
>> however it looks some think it is stable now, as mentioned here,
>> https://github.com/apache/spark/pull/10750#issuecomment-175400704.
>>
>>
>> Thanks!
>>
>>
>> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh :
>>
>>> Hi Josh,
>>>
>>> Can you please clarify whether date comparisons as two strings work at
>>> all?
>>>
>>> I was under the impression is that with string comparison only first
>>> characters are compared?
>>>
>>> Thanks
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 14 April 2016 at 19:26, Josh Rosen  wrote:
>>>
 AFAIK this is not being pushed down because it involves an implicit
 cast and we currently don't push casts into data sources or scans; see
 https://github.com/databricks/spark-redshift/issues/155 for a
 possibly-related discussion.

 On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Are you comparing strings in here or timestamp?
>
> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
> (cast(registration#37 as string) <= 2015-05-29))
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 April 2016 at 18:04, Kiran Chitturi <
> kiran.chitt...@lucidworks.com> wrote:
>
>> Hi,
>>
>> Timestamp range filter queries in SQL are not getting pushed down to
>> the PrunedFilteredScan instances. The filtering is happening at the Spark
>> layer.
>>
>> The physical plan for timestamp range queries is not showing the
>> pushed filters where as range queries on other types is working fine as 
>> the
>> physical plan is showing the pushed filters.
>>
>> Please see below for code and examples.
>>
>> *Example:*
>>
>> *1.* Range filter queries on Timestamp types
>>
>>*code: *
>>
>>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>>
>>*Full example*:
>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>> *plan*:
>> 

Re: Spark sql not pushing down timestamp range queries

2016-04-14 Thread Takeshi Yamamuro
Hi, Mich

Did you check the URL Josh referred to?;
the cast for string comparisons is needed for accepting `c_date >= "2016"`.

// maropu


On Fri, Apr 15, 2016 at 10:30 AM, Hyukjin Kwon  wrote:

> Hi,
>
>
> String comparison itself is pushed down fine but the problem is to deal
> with Cast.
>
>
> It was pushed down before but is was reverted, (
> https://github.com/apache/spark/pull/8049).
>
> Several fixes were tried here, https://github.com/apache/spark/pull/11005
> and etc. but there were no changes to make it.
>
>
> To cut it short, it is not being pushed down because it is unsafe to
> resolve cast (eg. long to integer)
>
> For an workaround,  the implementation of Solr data source should be
> changed to one with CatalystScan, which take all the filters.
>
> But CatalystScan is not designed to be binary compatible across releases,
> however it looks some think it is stable now, as mentioned here,
> https://github.com/apache/spark/pull/10750#issuecomment-175400704.
>
>
> Thanks!
>
>
> 2016-04-15 3:30 GMT+09:00 Mich Talebzadeh :
>
>> Hi Josh,
>>
>> Can you please clarify whether date comparisons as two strings work at
>> all?
>>
>> I was under the impression is that with string comparison only first
>> characters are compared?
>>
>> Thanks
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 April 2016 at 19:26, Josh Rosen  wrote:
>>
>>> AFAIK this is not being pushed down because it involves an implicit cast
>>> and we currently don't push casts into data sources or scans; see
>>> https://github.com/databricks/spark-redshift/issues/155 for a
>>> possibly-related discussion.
>>>
>>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Are you comparing strings in here or timestamp?

 Filter ((cast(registration#37 as string) >= 2015-05-28) &&
 (cast(registration#37 as string) <= 2015-05-29))


 Dr Mich Talebzadeh



 LinkedIn * 
 https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
 *



 http://talebzadehmich.wordpress.com



 On 14 April 2016 at 18:04, Kiran Chitturi <
 kiran.chitt...@lucidworks.com> wrote:

> Hi,
>
> Timestamp range filter queries in SQL are not getting pushed down to
> the PrunedFilteredScan instances. The filtering is happening at the Spark
> layer.
>
> The physical plan for timestamp range queries is not showing the
> pushed filters where as range queries on other types is working fine as 
> the
> physical plan is showing the pushed filters.
>
> Please see below for code and examples.
>
> *Example:*
>
> *1.* Range filter queries on Timestamp types
>
>*code: *
>
>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>
>*Full example*:
> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
> *plan*:
> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>
> *2. * Range filter queries on Long types
>
> *code*:
>
>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
>> `length` <= '1000'")
>
> *Full example*:
> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
> *plan*:
> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>
> The SolrRelation class we use extends
> 
> the PrunedFilteredScan.
>
> Since Solr supports date ranges, I would like for the timestamp
> filters to be pushed down to the Solr query.
>
> Are there limitations on the type of filters that are passed down with
> Timestamp types ?
> Is there something that I should do in my code to fix this ?
>
> Thanks,
> --
> Kiran Chitturi
>
>

>>
>


-- 
---
Takeshi Yamamuro


Re: Spark sql not pushing down timestamp range queries

2016-04-14 Thread Hyukjin Kwon
Hi,


String comparison itself is pushed down fine but the problem is to deal
with Cast.


It was pushed down before but is was reverted, (
https://github.com/apache/spark/pull/8049).

Several fixes were tried here, https://github.com/apache/spark/pull/11005
and etc. but there were no changes to make it.


To cut it short, it is not being pushed down because it is unsafe to
resolve cast (eg. long to integer)

For an workaround,  the implementation of Solr data source should be
changed to one with CatalystScan, which take all the filters.

But CatalystScan is not designed to be binary compatible across releases,
however it looks some think it is stable now, as mentioned here,
https://github.com/apache/spark/pull/10750#issuecomment-175400704.


Thanks!


2016-04-15 3:30 GMT+09:00 Mich Talebzadeh :

> Hi Josh,
>
> Can you please clarify whether date comparisons as two strings work at all?
>
> I was under the impression is that with string comparison only first
> characters are compared?
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 April 2016 at 19:26, Josh Rosen  wrote:
>
>> AFAIK this is not being pushed down because it involves an implicit cast
>> and we currently don't push casts into data sources or scans; see
>> https://github.com/databricks/spark-redshift/issues/155 for a
>> possibly-related discussion.
>>
>> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Are you comparing strings in here or timestamp?
>>>
>>> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
>>> (cast(registration#37 as string) <= 2015-05-29))
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 14 April 2016 at 18:04, Kiran Chitturi >> > wrote:
>>>
 Hi,

 Timestamp range filter queries in SQL are not getting pushed down to
 the PrunedFilteredScan instances. The filtering is happening at the Spark
 layer.

 The physical plan for timestamp range queries is not showing the pushed
 filters where as range queries on other types is working fine as the
 physical plan is showing the pushed filters.

 Please see below for code and examples.

 *Example:*

 *1.* Range filter queries on Timestamp types

*code: *

> sqlContext.sql("SELECT * from events WHERE `registration` >=
> '2015-05-28' AND `registration` <= '2015-05-29' ")

*Full example*:
 https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
 *plan*:
 https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql

 *2. * Range filter queries on Long types

 *code*:

> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
> `length` <= '1000'")

 *Full example*:
 https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
 *plan*:
 https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql

 The SolrRelation class we use extends
 
 the PrunedFilteredScan.

 Since Solr supports date ranges, I would like for the timestamp filters
 to be pushed down to the Solr query.

 Are there limitations on the type of filters that are passed down with
 Timestamp types ?
 Is there something that I should do in my code to fix this ?

 Thanks,
 --
 Kiran Chitturi


>>>
>


Re: Spark sql not pushing down timestamp range queries

2016-04-14 Thread Mich Talebzadeh
Hi Josh,

Can you please clarify whether date comparisons as two strings work at all?

I was under the impression is that with string comparison only first
characters are compared?

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 14 April 2016 at 19:26, Josh Rosen  wrote:

> AFAIK this is not being pushed down because it involves an implicit cast
> and we currently don't push casts into data sources or scans; see
> https://github.com/databricks/spark-redshift/issues/155 for a
> possibly-related discussion.
>
> On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Are you comparing strings in here or timestamp?
>>
>> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
>> (cast(registration#37 as string) <= 2015-05-29))
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> *
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 14 April 2016 at 18:04, Kiran Chitturi 
>> wrote:
>>
>>> Hi,
>>>
>>> Timestamp range filter queries in SQL are not getting pushed down to the
>>> PrunedFilteredScan instances. The filtering is happening at the Spark layer.
>>>
>>> The physical plan for timestamp range queries is not showing the pushed
>>> filters where as range queries on other types is working fine as the
>>> physical plan is showing the pushed filters.
>>>
>>> Please see below for code and examples.
>>>
>>> *Example:*
>>>
>>> *1.* Range filter queries on Timestamp types
>>>
>>>*code: *
>>>
 sqlContext.sql("SELECT * from events WHERE `registration` >=
 '2015-05-28' AND `registration` <= '2015-05-29' ")
>>>
>>>*Full example*:
>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>> *plan*:
>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>>>
>>> *2. * Range filter queries on Long types
>>>
>>> *code*:
>>>
 sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
 `length` <= '1000'")
>>>
>>> *Full example*:
>>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>>> *plan*:
>>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>>>
>>> The SolrRelation class we use extends
>>> 
>>> the PrunedFilteredScan.
>>>
>>> Since Solr supports date ranges, I would like for the timestamp filters
>>> to be pushed down to the Solr query.
>>>
>>> Are there limitations on the type of filters that are passed down with
>>> Timestamp types ?
>>> Is there something that I should do in my code to fix this ?
>>>
>>> Thanks,
>>> --
>>> Kiran Chitturi
>>>
>>>
>>


Re: Spark sql not pushing down timestamp range queries

2016-04-14 Thread Josh Rosen
AFAIK this is not being pushed down because it involves an implicit cast
and we currently don't push casts into data sources or scans; see
https://github.com/databricks/spark-redshift/issues/155 for a
possibly-related discussion.

On Thu, Apr 14, 2016 at 10:27 AM Mich Talebzadeh 
wrote:

> Are you comparing strings in here or timestamp?
>
> Filter ((cast(registration#37 as string) >= 2015-05-28) &&
> (cast(registration#37 as string) <= 2015-05-29))
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 14 April 2016 at 18:04, Kiran Chitturi 
> wrote:
>
>> Hi,
>>
>> Timestamp range filter queries in SQL are not getting pushed down to the
>> PrunedFilteredScan instances. The filtering is happening at the Spark layer.
>>
>> The physical plan for timestamp range queries is not showing the pushed
>> filters where as range queries on other types is working fine as the
>> physical plan is showing the pushed filters.
>>
>> Please see below for code and examples.
>>
>> *Example:*
>>
>> *1.* Range filter queries on Timestamp types
>>
>>*code: *
>>
>>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>>
>>*Full example*:
>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>> *plan*:
>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>>
>> *2. * Range filter queries on Long types
>>
>> *code*:
>>
>>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
>>> `length` <= '1000'")
>>
>> *Full example*:
>> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
>> *plan*:
>> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>>
>> The SolrRelation class we use extends
>> 
>> the PrunedFilteredScan.
>>
>> Since Solr supports date ranges, I would like for the timestamp filters
>> to be pushed down to the Solr query.
>>
>> Are there limitations on the type of filters that are passed down with
>> Timestamp types ?
>> Is there something that I should do in my code to fix this ?
>>
>> Thanks,
>> --
>> Kiran Chitturi
>>
>>
>


Re: Spark sql not pushing down timestamp range queries

2016-04-14 Thread Mich Talebzadeh
Are you comparing strings in here or timestamp?

Filter ((cast(registration#37 as string) >= 2015-05-28) &&
(cast(registration#37 as string) <= 2015-05-29))


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com



On 14 April 2016 at 18:04, Kiran Chitturi 
wrote:

> Hi,
>
> Timestamp range filter queries in SQL are not getting pushed down to the
> PrunedFilteredScan instances. The filtering is happening at the Spark layer.
>
> The physical plan for timestamp range queries is not showing the pushed
> filters where as range queries on other types is working fine as the
> physical plan is showing the pushed filters.
>
> Please see below for code and examples.
>
> *Example:*
>
> *1.* Range filter queries on Timestamp types
>
>*code: *
>
>> sqlContext.sql("SELECT * from events WHERE `registration` >=
>> '2015-05-28' AND `registration` <= '2015-05-29' ")
>
>*Full example*:
> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
> *plan*:
> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql
>
> *2. * Range filter queries on Long types
>
> *code*:
>
>> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and
>> `length` <= '1000'")
>
> *Full example*:
> https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
> *plan*:
> https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql
>
> The SolrRelation class we use extends
> 
> the PrunedFilteredScan.
>
> Since Solr supports date ranges, I would like for the timestamp filters to
> be pushed down to the Solr query.
>
> Are there limitations on the type of filters that are passed down with
> Timestamp types ?
> Is there something that I should do in my code to fix this ?
>
> Thanks,
> --
> Kiran Chitturi
>
>


Spark sql not pushing down timestamp range queries

2016-04-14 Thread Kiran Chitturi
Hi,

Timestamp range filter queries in SQL are not getting pushed down to the
PrunedFilteredScan instances. The filtering is happening at the Spark layer.

The physical plan for timestamp range queries is not showing the pushed
filters where as range queries on other types is working fine as the
physical plan is showing the pushed filters.

Please see below for code and examples.

*Example:*

*1.* Range filter queries on Timestamp types

   *code: *

> sqlContext.sql("SELECT * from events WHERE `registration` >= '2015-05-28'
> AND `registration` <= '2015-05-29' ")

   *Full example*:
https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
*plan*:
https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-time-range-sql

*2. * Range filter queries on Long types

*code*:

> sqlContext.sql("SELECT * from events WHERE `length` >= '700' and `length`
> <= '1000'")

*Full example*:
https://github.com/lucidworks/spark-solr/blob/master/src/test/scala/com/lucidworks/spark/EventsimTestSuite.scala#L151
*plan*:
https://gist.github.com/kiranchitturi/4a52688c9f0abe3d4b2bd8b938044421#file-length-range-sql

The SolrRelation class we use extends

the PrunedFilteredScan.

Since Solr supports date ranges, I would like for the timestamp filters to
be pushed down to the Solr query.

Are there limitations on the type of filters that are passed down with
Timestamp types ?
Is there something that I should do in my code to fix this ?

Thanks,
-- 
Kiran Chitturi