Re: Spark with Calcite JDBC and Druid adapter

Julian Hyde Thu, 17 Nov 2016 17:08:16 -0800

I see that in one place hashTagId is quoted, and in another it is not. So, 
whoever is generating the SQL (Spark?) is not being consistent, which is a 
worry.


In the default lexical convention, unquoted columns are converted to upper 
case. But your column is mixed case. So you need to fix Spark to generate 
appropriate quoting, or use lex=JAVA in Calcite so that unquoted columns stay 
the same case.

I’m glad to see that “value” and “count” are quoted. They are SQL reserved 
words, so they have to be. But if you could change the column names to 
something else it will make your life easier.

Julian

> On Nov 15, 2016, at 10:33 AM, herman...@teeupdata.com wrote:
> 
> Played around Calcite JDBC settings, especially with lexical, some settings 
> returns empty set (default with caseSensitive=false) when where is 
> filter/join, some just failed at parsing phase (e.g. lex=JAVA) :
> 
> ava.sql.SQLException: Error while preparing statement [SELECT 
> "timestamp","commentorId","hashTagId”,"value","count" FROM yyy WHERE 
> (hashTagId IS NOT NULL) AND (hashTagId = 'hashTag_01')]
>       at org.apache.calcite.avatica.Helper.createException(Helper.java:56)
>       at org.apache.calcite.avatica.Helper.createException(Helper.java:41)
>       at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement_(CalciteConnectionImpl.java:204)
>       at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:186)
>       at 
> org.apache.calcite.jdbc.CalciteConnectionImpl.prepareStatement(CalciteConnectionImpl.java:87)
>       at 
> org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:264)
> 
> when with default setting, after removing "(hashTagId IS NOT NULL) AND" from 
> where clause, correct result set returned. so it does seem to me that this is 
> calcite configuration issue. Does anybody have any experience using Calcite 
> JDBC with Spark?
> 
> thanks
> Herman.
> 
> 
>> On Nov 15, 2016, at 10:45, herman...@teeupdata.com wrote:
>> 
>> Hi everyone,
>> 
>> When accessing Druid through Calcite JDBC and Druid adapter from Spark, I 
>> have been experiencing strange results. 
>> 
>> Druid data schema is defined as:
>> 
>>   {
>>     "type": "custom",
>>     "name": “xxx",
>>     "factory": "org.apache.calcite.adapter.druid.DruidSchemaFactory",
>>     "operand": {
>>       "url": "http://<host>:8082",
>>       "coordinatorUrl": "http://<host>:8081"
>>     },
>>     "tables": [
>>       {
>>         "name": “yyy",
>>         "factory": "org.apache.calcite.adapter.druid.DruidTableFactory",
>>         "operand": {
>>           "dataSource": “<DruidDataSource>t",
>>           "interval": "2016-11-08T00:00:00.000Z/2016-12-31T00:00:00.000Z",
>>           "timestampColumn": "timestamp",
>>           "dimensions": [
>>             "commentorId", 
>>             "hashTagId",
>>             “value"
>>           ],
>>         "metrics": [
>>                              {
>>                                "type" : "count",
>>                                "name" : "count"
>>                              }
>>                 ]
>>         }
>>       }
>>     ]
>> 
>> with a  JDBC client, queries like these work fine:
>> 1. select * from yay where hashTagId=“hashTag_01”
>> 2. select badcount.hashTagId, badcount.bad,totalcount.total, 
>> badcount.bad/totalcount.total*100 as bad_pct 
>>      from 
>>              (select hashTagId, cast(count(*) as double) as bad from yyy 
>> where value='bad' group by hashTagId) as badcount
>>                  join 
>>                              (select hashTagId, cast(count(*) as double) as 
>> total from yyy group by hashTagId) as totalcount
>>                          on 
>>                                      
>> (badcount.hashTagId=totalcount.hashTagId)
>> 
>> However, in spark 2.0, it is strange
>> 1. df_yyy = spark.read.format(“jdbc”).option(“url”,jdbc:calcite:model=<path 
>> to schema json>;caseSensitive=false”)…
>> 2. df_yyy.show() —— works fine, returns all records
>> 3. df_yyy.filter($”hashTagId”=“hashTag_01”).count() — returns the correct 
>> number of records
>> 4. df_yyy.filter($”hashTagId”=“hashTag_01”).show() — returns empty result set
>> 5. df_yyy.join(<another mysql based data frame>, <condition>).show() —— 
>> returns empty result set (any joins returns empty result set)
>> 
>> I am suspecting there are conflicts between  how spark parses SQL and how 
>> Calcite JDBC does. are there special properties to set as of JDBC string to 
>> make it work with Spark? Is there a Calcite JDBC log file that I can dig 
>> through? I did some googling and don’t see similar usage with 
>> spark/calcite/druid, is this the right way accessing druid from spark? (may 
>> be this is a question better for spark/druid community…)
>> 
>> Thanks.
>> Herman.
>> 
>

Re: Spark with Calcite JDBC and Druid adapter

Reply via email to