date:20190122

[jira] [Commented] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

2019-01-22 Thread Khurram Faraaz (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749568#comment-16749568
 ] 

Khurram Faraaz commented on DRILL-6994:
---

[~kkhatua]  here are the parquet schema details
{noformat}
[test@md123-45 parquet]# ./parquet-schema 
infer_schema_example.parquet/part-0-53f066b2-ca90-499e-a976-e5282d1b59ac-c000.snappy.parquet
message spark_schema {
optional binary Name (UTF8);
optional binary Department (UTF8);
optional int32 years_of_experience;
optional int96 DOB;
}{noformat}
 

> TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill
> ---
>
> Key: DRILL-6994
> URL: https://issues.apache.org/jira/browse/DRILL-6994
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Priority: Major
>
> A timestamp type column in a parquet file created from Spark is treated as 
> VARBINARY by Drill 1.14.0., Trying to cast DOB column to DATE results in an 
> Exception, although the monthOfYear field is in the allowed range.
> Data used in the test
> {noformat}
> [test@md123 spark_data]# cat inferSchema_example.csv
> Name,Department,years_of_experience,DOB
> Sam,Software,5,1990-10-10
> Alex,Data Analytics,3,1992-10-10
> {noformat}
> Create the parquet file using the above CSV file
> {noformat}
> [test@md123 bin]# ./spark-shell
> 19/01/22 21:21:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Spark context Web UI available at http://md123.qa.lab:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1548192099796).
> Spark session available as 'spark'.
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
>  /_/
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.apache.spark.sql.\{DataFrame, SQLContext}
> import org.apache.spark.sql.\{DataFrame, SQLContext}
> scala> import org.apache.spark.\{SparkConf, SparkContext}
> import org.apache.spark.\{SparkConf, SparkContext}
> scala> val sqlContext: SQLContext = new SQLContext(sc)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> sqlContext: org.apache.spark.sql.SQLContext = 
> org.apache.spark.sql.SQLContext@2e0163cb
> scala> val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load("/apps/inferSchema_example.csv")
> df: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 
> more fields]
> scala> df.printSchema
> test
>  |-- Name: string (nullable = true)
>  |-- Department: string (nullable = true)
>  |-- years_of_experience: integer (nullable = true)
>  |-- DOB: timestamp (nullable = true)
> scala> df.write.parquet("/apps/infer_schema_example.parquet")
> // Read the parquet file
> scala> val data = 
> sqlContext.read.parquet("/apps/infer_schema_example.parquet")
> data: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 
> 2 more fields]
> // Print the schema of the parquet file from Spark
> scala> data.printSchema
> test
>  |-- Name: string (nullable = true)
>  |-- Department: string (nullable = true)
>  |-- years_of_experience: integer (nullable = true)
>  |-- DOB: timestamp (nullable = true)
> // Display the contents of parquet file on spark-shell
> // register temp table and do a show on all records,to display.
> scala> data.registerTempTable("employee")
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> scala> val allrecords = sqlContext.sql("SELeCT * FROM employee")
> allrecords: org.apache.spark.sql.DataFrame = [Name: string, Department: 
> string ... 2 more fields]
> scala> allrecords.show()
> ++--+---+---+
> |Name| Department|years_of_experience| DOB|
> ++--+---+---+
> | Sam| Software| 5|1990-10-10 00:00:00|
> |Alex|Data Analytics| 3|1992-10-10 00:00:00|
> ++--+---+---+
> {noformat}
> Querying the parquet file from Drill 1.14.0-mapr, results in the DOB column 
> (timestamp type in Spark) being treated as VARBINARY.
> {noformat}
> apache drill 1.14.0-mapr
> "a little sql for your nosql"
> 0: jdbc:drill:schema=dfs.tmp> select * from 
> dfs.`/apps/infer_schema_example.parquet`;
> +---+-+--+--+
> | Name | Department | years_of_experience | DOB |
> +---+-+--+--+
> | Sam | Software |

[jira] [Commented] (DRILL-6827) Apache Drill 1.14 on a kerberized Cloudera cluster (CDH 5.14).

2019-01-22 Thread Sorabh Hamirwasia (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749564#comment-16749564
 ] 

Sorabh Hamirwasia commented on DRILL-6827:
--

Your configurations is incorrect. Please refer to documentation here for 
WebServer configuration:
https://drill.apache.org/docs/configuring-drill-to-use-spnego-for-http-authentication/

Also I am seeing that you have both ssl and sasl encryption enabled which is an 
overkill.


{code:java}
user.encryption.sasl.enabled: true, user.encryption.sasl.max_wrapped_size: 
65536   }
,
  security.user.encryption.ssl:

{ enabled: true, keyPassword: "X", handshakeTimeout: 1, 
provider: "JDK"   }
,
  ssl:

{ keyStorePath: "X", keyStorePassword: "X", trustStorePath: 
"X", trustStorePassword: "X"   }
{code}


> Apache Drill 1.14 on a kerberized Cloudera cluster (CDH 5.14).
> --
>
> Key: DRILL-6827
> URL: https://issues.apache.org/jira/browse/DRILL-6827
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Security
>Affects Versions: 1.14.0
> Environment: * Apache Drill 1.14
>  * Cloudera CDH 5.14
>Reporter: Ibrahim Safieddine
>Priority: Critical
>
> Hello,
>  
> I'am using apache Drill 1.14 on a kerberized Cloudera cluster (CDH 5.14).
>  
> When I activate kerberos authentification, drill server refuse to start with 
> error:
> {color:#ff}_org.apache.drill.exec.exception.DrillbitStartupException: 
> Authentication is enabled for WebServer but none of the security mechanism 
> was configured properly. Please verify the configurations and try 
> again._{color}
>  
> I can see in the logs that the kerberos authentification is ok: 
> [main] INFO  o.a.d.exec.server.BootStrapContext - Process user name: 'root' 
> and logged in successfully as 'tata/xx.yy...@xx.yy'
>  
> Can you help me please?
>  
> Based on the Apache Drill documentation, there is my conf/drill-override.conf:
>  
> drill.exec: {
>   cluster-id: "drillbits1",
>   zk.connect: "xx.yy.zz:2181",
>   service_name: "service1",
>   impersonation: {
>     enabled: true,
>     max_chained_user_hops: 3
>   },
>   security: {
>     user.auth.enabled:true,
>     auth.mechanisms:["KERBEROS"],
>     auth.principal:"tata/xx.yy...@xx.yy",
>     auth.keytab:"keytab1.keytab",
>     drill.exec.security.auth.auth_to_local:hive,
>     auth.realm: "XX.YY",
>     user.encryption.sasl.enabled: true,
>     user.encryption.sasl.max_wrapped_size: 65536
>   },
>   security.user.encryption.ssl: {
>     enabled: true,
>     keyPassword: "X",
>     handshakeTimeout: 1,
>     provider: "JDK"
>   },
>   ssl: {
>     keyStorePath: "X",
>     keyStorePassword: "X",
>     trustStorePath: "X",
>     trustStorePassword: "X"
>   },
>   http: {
>     enabled: true,
>     auth.enabled: false,
>     auth.mechanisms: ["KERBEROS"],
>     ssl_enabled: true,
>     port: 8047
>     session_max_idle_secs: 3600, # Default value 1hr
>     cors: {
>       enabled: false,
>       allowedOrigins: ["null"],
>       allowedMethods: ["GET", "POST", "HEAD", "OPTIONS"],
>       allowedHeaders: ["X-Requested-With", "Content-Type", "Accept", 
> "Origin"],
>       credentials: true
>     }
>   }
> }
>  Thank you
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

2019-01-22 Thread Kunal Khatua (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749539#comment-16749539
 ] 

Kunal Khatua commented on DRILL-6994:
-

[~khfaraaz] what does the schema look like according to the \{{parquet-tools}} 
utility? 

> TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill
> ---
>
> Key: DRILL-6994
> URL: https://issues.apache.org/jira/browse/DRILL-6994
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Priority: Major
>
> A timestamp type column in a parquet file created from Spark is treated as 
> VARBINARY by Drill 1.14.0., Trying to cast DOB column to DATE results in an 
> Exception, although the monthOfYear field is in the allowed range.
> Data used in the test
> {noformat}
> [test@md123 spark_data]# cat inferSchema_example.csv
> Name,Department,years_of_experience,DOB
> Sam,Software,5,1990-10-10
> Alex,Data Analytics,3,1992-10-10
> {noformat}
> Create the parquet file using the above CSV file
> {noformat}
> [test@md123 bin]# ./spark-shell
> 19/01/22 21:21:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Spark context Web UI available at http://md123.qa.lab:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1548192099796).
> Spark session available as 'spark'.
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
>  /_/
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.apache.spark.sql.\{DataFrame, SQLContext}
> import org.apache.spark.sql.\{DataFrame, SQLContext}
> scala> import org.apache.spark.\{SparkConf, SparkContext}
> import org.apache.spark.\{SparkConf, SparkContext}
> scala> val sqlContext: SQLContext = new SQLContext(sc)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> sqlContext: org.apache.spark.sql.SQLContext = 
> org.apache.spark.sql.SQLContext@2e0163cb
> scala> val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load("/apps/inferSchema_example.csv")
> df: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 
> more fields]
> scala> df.printSchema
> test
>  |-- Name: string (nullable = true)
>  |-- Department: string (nullable = true)
>  |-- years_of_experience: integer (nullable = true)
>  |-- DOB: timestamp (nullable = true)
> scala> df.write.parquet("/apps/infer_schema_example.parquet")
> // Read the parquet file
> scala> val data = 
> sqlContext.read.parquet("/apps/infer_schema_example.parquet")
> data: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 
> 2 more fields]
> // Print the schema of the parquet file from Spark
> scala> data.printSchema
> test
>  |-- Name: string (nullable = true)
>  |-- Department: string (nullable = true)
>  |-- years_of_experience: integer (nullable = true)
>  |-- DOB: timestamp (nullable = true)
> // Display the contents of parquet file on spark-shell
> // register temp table and do a show on all records,to display.
> scala> data.registerTempTable("employee")
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> scala> val allrecords = sqlContext.sql("SELeCT * FROM employee")
> allrecords: org.apache.spark.sql.DataFrame = [Name: string, Department: 
> string ... 2 more fields]
> scala> allrecords.show()
> ++--+---+---+
> |Name| Department|years_of_experience| DOB|
> ++--+---+---+
> | Sam| Software| 5|1990-10-10 00:00:00|
> |Alex|Data Analytics| 3|1992-10-10 00:00:00|
> ++--+---+---+
> {noformat}
> Querying the parquet file from Drill 1.14.0-mapr, results in the DOB column 
> (timestamp type in Spark) being treated as VARBINARY.
> {noformat}
> apache drill 1.14.0-mapr
> "a little sql for your nosql"
> 0: jdbc:drill:schema=dfs.tmp> select * from 
> dfs.`/apps/infer_schema_example.parquet`;
> +---+-+--+--+
> | Name | Department | years_of_experience | DOB |
> +---+-+--+--+
> | Sam | Software | 5 | [B@2bef51f2 |
> | Alex | Data Analytics | 3 | [B@650eab8 |
> +---+-+--+--+
> 2 rows selected (0.229 seconds)
> // typeof(DOB) column returns a VARBINARY type, whereas the parquet schema in 
> Spark for DOB:

[jira] [Commented] (DRILL-5807) ambiguous error

2019-01-22 Thread Kunal Khatua (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-5807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749538#comment-16749538
 ] 

Kunal Khatua commented on DRILL-5807:
-

I'm wondering if moving the aliases within the parenthesis might resolve the 
issue.
e.g. 
{code}... FROM "dws_tb_crm_u2_itm_base_df" d0  ...{code}



> ambiguous error
> ---
>
> Key: DRILL-5807
> URL: https://issues.apache.org/jira/browse/DRILL-5807
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - JDBC
>Affects Versions: 1.11.0
> Environment: Linux
>Reporter: XiaHang
>Priority: Critical
>
> if the final plan like below , JdbcFilter is below a JdbcJoin and above 
> another JdbcJoin . 
> JdbcProject(order_id=[$0], mord_id=[$6], item_id=[$2], div_pay_amt=[$5], 
> item_quantity=[$4], slr_id=[$11]): rowcount = 5625.0, cumulative cost = 
> {12540.0 rows, 29763.0 cpu, 0.0 io}, id = 327
> JdbcJoin(condition=[=($3, $11)], joinType=[left]): rowcount = 5625.0, 
> cumulative cost = {8040.0 rows, 2763.0 cpu, 0.0 io}, id = 325
>   JdbcFilter(condition=[OR(AND(OR(IS NOT NULL($7), >($5, 0)), =($1, 2), 
> OR(AND(=($10, '箱包皮具/热销女包/男包'), >(/($5, $4), 1000)), AND(OR(=($10, '家装主材'), 
> =($10, '大家电')), >(/($5, $4), 1000)), AND(OR(=($10, '珠宝/钻石/翡翠/黄金'), =($10, 
> '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(>(/($5, $4), 500), <>($10, 
> '箱包皮具/热销女包/男包'), <>($10, '家装主材'), <>($10, '大家电'), <>($10, '珠宝/钻石/翡翠/黄金'), 
> <>($10, '饰品/流行首饰/时尚饰品新'))), <>($10, '成人用品/情趣用品'), <>($10, '鲜花速递/花卉仿真/绿植园艺'), 
> <>($10, '水产肉类/新鲜蔬果/熟食')), AND(<=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), 
> EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(14, 24), 60), 60)), 
> OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '家装主材'), =($10, '大家电'), =($10, 
> '珠宝/钻石/翡翠/黄金'), =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 2000)), AND(OR(=($10, 
> '男装'), =($10, '女装/女士精品'), =($10, '办公设备/耗材/相关服务')), >(/($5, $4), 1000)), 
> AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), >(/($5, $4), 1500))), IS NOT NULL($8)), 
> AND(>=(-(EXTRACT(FLAG(EPOCH), CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), 
> CAST($8):TIMESTAMP(0))), *(*(*(15, 24), 60), 60)), <=(-(EXTRACT(FLAG(EPOCH), 
> CURRENT_TIMESTAMP), EXTRACT(FLAG(EPOCH), CAST($8):TIMESTAMP(0))), *(*(*(60, 
> 24), 60), 60)), OR(AND(OR(=($10, '箱包皮具/热销女包/男包'), =($10, '珠宝/钻石/翡翠/黄金'), 
> =($10, '饰品/流行首饰/时尚饰品新')), >(/($5, $4), 5000)), AND(OR(=($10, '男装'), =($10, 
> '女装/女士精品')), >(/($5, $4), 3000)), AND(OR(=($10, '流行男鞋'), =($10, '女鞋')), 
> >(/($5, $4), 2500)), AND(=($10, '办公设备/耗材/相关服务'), >(/($5, $4), 2000))), IS NOT 
> NULL($8)))]): rowcount = 375.0, cumulative cost = {2235.0 rows, 2582.0 cpu, 
> 0.0 io}, id = 320
> JdbcJoin(condition=[=($2, $9)], joinType=[left]): rowcount = 1500.0, 
> cumulative cost = {1860.0 rows, 1082.0 cpu, 0.0 io}, id = 318
>   JdbcProject(order_id=[$0], pay_status=[$2], item_id=[$3], 
> seller_id=[$5], item_quantity=[$7], div_pay_amt=[$20], mord_id=[$1], 
> pay_time=[$19], succ_time=[$52]): rowcount = 100.0, cumulative cost = {180.0 
> rows, 821.0 cpu, 0.0 io}, id = 313
> JdbcTableScan(table=[[public, dws_tb_crm_u2_ord_base_df]]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 29
>   JdbcProject(item_id=[$0], cate_level1_name=[$47]): rowcount = 
> 100.0, cumulative cost = {180.0 rows, 261.0 cpu, 0.0 io}, id = 316
> JdbcTableScan(table=[[public, dws_tb_crm_u2_itm_base_df]]): 
> rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 46
>   JdbcProject(slr_id=[$3]): rowcount = 100.0, cumulative cost = {180.0 
> rows, 181.0 cpu, 0.0 io}, id = 323
> JdbcTableScan(table=[[public, dws_tb_crm_u2_slr_base]]): rowcount = 
> 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 68
> the sql is converted to 
> SELECT "t1"."order_id", "t1"."mord_id", "t1"."item_id", "t1"."div_pay_amt", 
> "t1"."item_quantity", "t2"."slr_id"
> FROM (SELECT *
> FROM (SELECT "order_id", "pay_status", "item_id", "seller_id", 
> "item_quantity", "div_pay_amt", "mord_id", "pay_time", "succ_time"
> FROM "dws_tb_crm_u2_ord_base_df") AS "t"
> LEFT JOIN (SELECT "item_id", "cate_level1_name"
> FROM "dws_tb_crm_u2_itm_base_df") AS "t0" ON "t"."item_id" = "t0"."item_id"
> WHERE ("t"."pay_time" IS NOT NULL OR "t"."div_pay_amt" > 0) AND 
> "t"."pay_status" = 2 AND ("t0"."cate_level1_name" = '箱包皮具/热销女包/男包' AND 
> "t"."div_pay_amt" / "t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = 
> '家装主材' OR "t0"."cate_level1_name" = '大家电') AND "t"."div_pay_amt" / 
> "t"."item_quantity" > 1000 OR ("t0"."cate_level1_name" = '珠宝/钻石/翡翠/黄金' OR 
> "t0"."cate_level1_name" = '饰品/流行首饰/时尚饰品新') AND "t"."div_pay_amt" / 
> "t"."item_quantity" > 2000 OR "t"."div_pay_amt" / "t"."item_quantity" > 500 
> AND "t0"."cate_level1_name" <> '箱包皮具/热销女包/男包' AND "t0"."cate_level1_name" <> 
> '家装主材' AND "t0"."cate_level1_name" <> '大家电' AND

[jira] [Commented] (DRILL-6982) Affected rows count is not returned by Drill if return_result_set_for_ddl is false

2019-01-22 Thread Bridget Bevens (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749392#comment-16749392
 ] 

Bridget Bevens commented on DRILL-6982:
---

Hi [~kkhatua],

Based on https://issues.apache.org/jira/browse/DRILL-6834, Drill doc was 
reviewed and approved. The Drill documentation states the following about the 
the exec.return_result_set_for_ddl option:

_“When set to false, Drill returns the affected row count, and the result set 
is null.”_

So, based on this JIRA, seems that Drill is not having the expected behavior. 
I’m guessing this JIRA is asking for Drill to return the affected rows count 
for DDL statements when the exec.return_result_set_for_ddl option is set to 
false.

If you set the option to false and run the following create table command:

CREATE TABLE tmp.`name_key2` (N_NAME, N_NATIONKEY) AS SELECT N_NATIONKEY, 
N_NAME FROM 
dfs.`/root/drill-1.15/apache-drill-1.15.0-SNAPSHOT/sample-data/nation.parquet`;

Drill creates the table, but returns the following instead of the affected row 
count:

No rows affected (2.714 seconds)

Thanks,
 Bridget

 

> Affected rows count is not returned by Drill if return_result_set_for_ddl is 
> false
> --
>
> Key: DRILL-6982
> URL: https://issues.apache.org/jira/browse/DRILL-6982
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Priority: Major
>
> *Prerequisites:*
> {code:sql}
> set `exec.query.return_result_set_for_ddl`= false;
> {code}
> *Query:*
> {code:sql}
> create table dfs.tmp.`nation as select * from cp.`tpch/nation.parquet`;
> {code}
> *Expected result:*
> Drill should return the number of affected rows (25 in this case)
> *Actual Result:*
> The table was created, but affected rows count wasn't returned:
> {noformat}
> No rows affected (1.755 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

2019-01-22 Thread Khurram Faraaz (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz updated DRILL-6994:
--
Component/s: Execution - Data Types

> TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill
> ---
>
> Key: DRILL-6994
> URL: https://issues.apache.org/jira/browse/DRILL-6994
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Data Types
>Affects Versions: 1.14.0
>Reporter: Khurram Faraaz
>Priority: Major
>
> A timestamp type column in a parquet file created from Spark is treated as 
> VARBINARY by Drill 1.14.0., Trying to cast DOB column to DATE results in an 
> Exception, although the monthOfYear field is in the allowed range.
> Data used in the test
> {noformat}
> [test@md123 spark_data]# cat inferSchema_example.csv
> Name,Department,years_of_experience,DOB
> Sam,Software,5,1990-10-10
> Alex,Data Analytics,3,1992-10-10
> {noformat}
> Create the parquet file using the above CSV file
> {noformat}
> [test@md123 bin]# ./spark-shell
> 19/01/22 21:21:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> Spark context Web UI available at http://md123.qa.lab:4040
> Spark context available as 'sc' (master = local[*], app id = 
> local-1548192099796).
> Spark session available as 'spark'.
> Welcome to
>   __
>  / __/__ ___ _/ /__
>  _\ \/ _ \/ _ `/ __/ '_/
>  /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
>  /_/
> Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> import org.apache.spark.sql.\{DataFrame, SQLContext}
> import org.apache.spark.sql.\{DataFrame, SQLContext}
> scala> import org.apache.spark.\{SparkConf, SparkContext}
> import org.apache.spark.\{SparkConf, SparkContext}
> scala> val sqlContext: SQLContext = new SQLContext(sc)
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> sqlContext: org.apache.spark.sql.SQLContext = 
> org.apache.spark.sql.SQLContext@2e0163cb
> scala> val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load("/apps/inferSchema_example.csv")
> df: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 
> more fields]
> scala> df.printSchema
> test
>  |-- Name: string (nullable = true)
>  |-- Department: string (nullable = true)
>  |-- years_of_experience: integer (nullable = true)
>  |-- DOB: timestamp (nullable = true)
> scala> df.write.parquet("/apps/infer_schema_example.parquet")
> // Read the parquet file
> scala> val data = 
> sqlContext.read.parquet("/apps/infer_schema_example.parquet")
> data: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 
> 2 more fields]
> // Print the schema of the parquet file from Spark
> scala> data.printSchema
> test
>  |-- Name: string (nullable = true)
>  |-- Department: string (nullable = true)
>  |-- years_of_experience: integer (nullable = true)
>  |-- DOB: timestamp (nullable = true)
> // Display the contents of parquet file on spark-shell
> // register temp table and do a show on all records,to display.
> scala> data.registerTempTable("employee")
> warning: there was one deprecation warning; re-run with -deprecation for 
> details
> scala> val allrecords = sqlContext.sql("SELeCT * FROM employee")
> allrecords: org.apache.spark.sql.DataFrame = [Name: string, Department: 
> string ... 2 more fields]
> scala> allrecords.show()
> ++--+---+---+
> |Name| Department|years_of_experience| DOB|
> ++--+---+---+
> | Sam| Software| 5|1990-10-10 00:00:00|
> |Alex|Data Analytics| 3|1992-10-10 00:00:00|
> ++--+---+---+
> {noformat}
> Querying the parquet file from Drill 1.14.0-mapr, results in the DOB column 
> (timestamp type in Spark) being treated as VARBINARY.
> {noformat}
> apache drill 1.14.0-mapr
> "a little sql for your nosql"
> 0: jdbc:drill:schema=dfs.tmp> select * from 
> dfs.`/apps/infer_schema_example.parquet`;
> +---+-+--+--+
> | Name | Department | years_of_experience | DOB |
> +---+-+--+--+
> | Sam | Software | 5 | [B@2bef51f2 |
> | Alex | Data Analytics | 3 | [B@650eab8 |
> +---+-+--+--+
> 2 rows selected (0.229 seconds)
> // typeof(DOB) column returns a VARBINARY type, whereas the parquet schema in 
> Spark for DOB: timestamp (nullable = true)
> 0: jdbc:drill:schema=dfs.tmp> select typeof(DOB) from 
>

[jira] [Created] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

2019-01-22 Thread Khurram Faraaz (JIRA)

Khurram Faraaz created DRILL-6994:
-

 Summary: TIMESTAMP type DOB column in Spark parquet is treated as 
VARBINARY in Drill
 Key: DRILL-6994
 URL: https://issues.apache.org/jira/browse/DRILL-6994
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Khurram Faraaz


A timestamp type column in a parquet file created from Spark is treated as 
VARBINARY by Drill 1.14.0., Trying to cast DOB column to DATE results in an 
Exception, although the monthOfYear field is in the allowed range.

Data used in the test
{noformat}
[test@md123 spark_data]# cat inferSchema_example.csv
Name,Department,years_of_experience,DOB
Sam,Software,5,1990-10-10
Alex,Data Analytics,3,1992-10-10
{noformat}

Create the parquet file using the above CSV file
{noformat}
[test@md123 bin]# ./spark-shell
19/01/22 21:21:34 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://md123.qa.lab:4040
Spark context available as 'sc' (master = local[*], app id = 
local-1548192099796).
Spark session available as 'spark'.
Welcome to
  __
 / __/__ ___ _/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.3.1-mapr-SNAPSHOT
 /_/

Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_191)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.apache.spark.sql.\{DataFrame, SQLContext}
import org.apache.spark.sql.\{DataFrame, SQLContext}

scala> import org.apache.spark.\{SparkConf, SparkContext}
import org.apache.spark.\{SparkConf, SparkContext}

scala> val sqlContext: SQLContext = new SQLContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.SQLContext = 
org.apache.spark.sql.SQLContext@2e0163cb

scala> val df = 
sqlContext.read.format("com.databricks.spark.csv").option("header", 
"true").option("inferSchema", "true").load("/apps/inferSchema_example.csv")
df: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 
more fields]

scala> df.printSchema
test
 |-- Name: string (nullable = true)
 |-- Department: string (nullable = true)
 |-- years_of_experience: integer (nullable = true)
 |-- DOB: timestamp (nullable = true)

scala> df.write.parquet("/apps/infer_schema_example.parquet")

// Read the parquet file
scala> val data = sqlContext.read.parquet("/apps/infer_schema_example.parquet")
data: org.apache.spark.sql.DataFrame = [Name: string, Department: string ... 2 
more fields]

// Print the schema of the parquet file from Spark
scala> data.printSchema
test
 |-- Name: string (nullable = true)
 |-- Department: string (nullable = true)
 |-- years_of_experience: integer (nullable = true)
 |-- DOB: timestamp (nullable = true)

// Display the contents of parquet file on spark-shell
// register temp table and do a show on all records,to display.

scala> data.registerTempTable("employee")
warning: there was one deprecation warning; re-run with -deprecation for details

scala> val allrecords = sqlContext.sql("SELeCT * FROM employee")
allrecords: org.apache.spark.sql.DataFrame = [Name: string, Department: string 
... 2 more fields]

scala> allrecords.show()
++--+---+---+
|Name| Department|years_of_experience| DOB|
++--+---+---+
| Sam| Software| 5|1990-10-10 00:00:00|
|Alex|Data Analytics| 3|1992-10-10 00:00:00|
++--+---+---+
{noformat}

Querying the parquet file from Drill 1.14.0-mapr, results in the DOB column 
(timestamp type in Spark) being treated as VARBINARY.

{noformat}
apache drill 1.14.0-mapr
"a little sql for your nosql"
0: jdbc:drill:schema=dfs.tmp> select * from 
dfs.`/apps/infer_schema_example.parquet`;
+---+-+--+--+
| Name | Department | years_of_experience | DOB |
+---+-+--+--+
| Sam | Software | 5 | [B@2bef51f2 |
| Alex | Data Analytics | 3 | [B@650eab8 |
+---+-+--+--+
2 rows selected (0.229 seconds)

// typeof(DOB) column returns a VARBINARY type, whereas the parquet schema in 
Spark for DOB: timestamp (nullable = true)

0: jdbc:drill:schema=dfs.tmp> select typeof(DOB) from 
dfs.`/apps/infer_schema_example.parquet`;
++
| EXPR$0 |
++
| VARBINARY |
| VARBINARY |
++
2 rows selected (0.199 seconds)
{noformat}

// CAST to DATE type results in Exception, though the monthOfYear is in the 
range [1,12]

{noformat}
0: jdbc:drill:schema=dfs.tmp> select cast(DOB as DATE) from 
dfs.`/apps/infer_schema_example.parquet`;
Error: SYSTEM ERROR: IllegalFieldValueException: Value 0 for monthOfYear must 
be in the range [1,12]

Fragment

[jira] [Updated] (DRILL-6910) A filtering column remains in scan when filter pruning happens

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6910:
---
Fix Version/s: 1.16.0

> A filtering column remains in scan when filter pruning happens
> --
>
> Key: DRILL-6910
> URL: https://issues.apache.org/jira/browse/DRILL-6910
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> {code:sql}
> create table dfs.tmp.`nation` as select * from cp.`tpch/nation.parquet`;
> {code}
> *Query:*
> {code:sql}
> explain plan for select n_nationkey from dfs.tmp.`nation` where n_regionkey < 
> 10;
> {code}
> *Expected result:*
>  The filtering column (n_regionkey) should not be present in scan operator.
> *Actual result:*
>  It remains in scan in spite of filter pruning.
> {noformat}
> 00-00Screen : rowType = RecordType(ANY n_nationkey): rowcount = 25.0, 
> cumulative cost = {52.5 rows, 77.5 cpu, 50.0 io, 0.0 network, 0.0 memory}, id 
> = 112988
> 00-01  Project(n_nationkey=[$1]) : rowType = RecordType(ANY n_nationkey): 
> rowcount = 25.0, cumulative cost = {50.0 rows, 75.0 cpu, 50.0 io, 0.0 
> network, 0.0 memory}, id = 112987
> 00-02Scan(table=[[dfs, tmp, nation]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/nation]], 
> selectionRoot=maprfs:/tmp/nation, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`n_regionkey`, `n_nationkey`]]]) : rowType = 
> RecordType(ANY n_regionkey, ANY n_nationkey): rowcount = 25.0, cumulative 
> cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 112986
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6910) A filtering column remains in scan when filter pruning happens

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6910:
---
Description: 
*Data:*
{code:sql}
create table dfs.tmp.`nation` as select * from cp.`tpch/nation.parquet` where 
n_regionkey < 10;
{code}
*Query:*
{code:sql}
explain plan for select n_nationkey from dfs.tmp.`nation` where n_regionkey < 
10;
{code}
*Expected result:*
The filtering column (n_regionkey) should not be present in scan operator.

*Actual result:*
It remains in scan in spite of filter pruning.
{noformat}
00-00Screen : rowType = RecordType(ANY n_nationkey): rowcount = 25.0, 
cumulative cost = {52.5 rows, 77.5 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 
112988
00-01  Project(n_nationkey=[$1]) : rowType = RecordType(ANY n_nationkey): 
rowcount = 25.0, cumulative cost = {50.0 rows, 75.0 cpu, 50.0 io, 0.0 network, 
0.0 memory}, id = 112987
00-02Scan(table=[[dfs, tmp, nation]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/nation]], 
selectionRoot=maprfs:/tmp/nation, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`n_regionkey`, `n_nationkey`]]]) : rowType = 
RecordType(ANY n_regionkey, ANY n_nationkey): rowcount = 25.0, cumulative cost 
= {25.0 rows, 50.0 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 112986
{noformat}

  was:
*Data:*
{code:sql}
create table dfs.tmp.`nation` as select * from cp.`tpch/nation.parquet`;
{code}
*Query:*
{code:sql}
explain plan for select n_nationkey from dfs.tmp.`nation` where n_regionkey < 
10;
{code}
*Expected result:*
 The filtering column (n_regionkey) should not be present in scan operator.

*Actual result:*
 It remains in scan in spite of filter pruning.
{noformat}
00-00Screen : rowType = RecordType(ANY n_nationkey): rowcount = 25.0, 
cumulative cost = {52.5 rows, 77.5 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 
112988
00-01  Project(n_nationkey=[$1]) : rowType = RecordType(ANY n_nationkey): 
rowcount = 25.0, cumulative cost = {50.0 rows, 75.0 cpu, 50.0 io, 0.0 network, 
0.0 memory}, id = 112987
00-02Scan(table=[[dfs, tmp, nation]], groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/nation]], 
selectionRoot=maprfs:/tmp/nation, numFiles=1, numRowGroups=1, 
usedMetadataFile=false, columns=[`n_regionkey`, `n_nationkey`]]]) : rowType = 
RecordType(ANY n_regionkey, ANY n_nationkey): rowcount = 25.0, cumulative cost 
= {25.0 rows, 50.0 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 112986
{noformat}


> A filtering column remains in scan when filter pruning happens
> --
>
> Key: DRILL-6910
> URL: https://issues.apache.org/jira/browse/DRILL-6910
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Volodymyr Vysotskyi
>Priority: Major
> Fix For: 1.16.0
>
>
> *Data:*
> {code:sql}
> create table dfs.tmp.`nation` as select * from cp.`tpch/nation.parquet` where 
> n_regionkey < 10;
> {code}
> *Query:*
> {code:sql}
> explain plan for select n_nationkey from dfs.tmp.`nation` where n_regionkey < 
> 10;
> {code}
> *Expected result:*
> The filtering column (n_regionkey) should not be present in scan operator.
> *Actual result:*
> It remains in scan in spite of filter pruning.
> {noformat}
> 00-00Screen : rowType = RecordType(ANY n_nationkey): rowcount = 25.0, 
> cumulative cost = {52.5 rows, 77.5 cpu, 50.0 io, 0.0 network, 0.0 memory}, id 
> = 112988
> 00-01  Project(n_nationkey=[$1]) : rowType = RecordType(ANY n_nationkey): 
> rowcount = 25.0, cumulative cost = {50.0 rows, 75.0 cpu, 50.0 io, 0.0 
> network, 0.0 memory}, id = 112987
> 00-02Scan(table=[[dfs, tmp, nation]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/nation]], 
> selectionRoot=maprfs:/tmp/nation, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`n_regionkey`, `n_nationkey`]]]) : rowType = 
> RecordType(ANY n_regionkey, ANY n_nationkey): rowcount = 25.0, cumulative 
> cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 112986
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (DRILL-6910) A filtering column remains in scan when filter pruning happens

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi reassigned DRILL-6910:
--

Assignee: Volodymyr Vysotskyi

> A filtering column remains in scan when filter pruning happens
> --
>
> Key: DRILL-6910
> URL: https://issues.apache.org/jira/browse/DRILL-6910
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.14.0
>Reporter: Anton Gozhiy
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>
> *Data:*
> {code:sql}
> create table dfs.tmp.`nation` as select * from cp.`tpch/nation.parquet`;
> {code}
> *Query:*
> {code:sql}
> explain plan for select n_nationkey from dfs.tmp.`nation` where n_regionkey < 
> 10;
> {code}
> *Expected result:*
>  The filtering column (n_regionkey) should not be present in scan operator.
> *Actual result:*
>  It remains in scan in spite of filter pruning.
> {noformat}
> 00-00Screen : rowType = RecordType(ANY n_nationkey): rowcount = 25.0, 
> cumulative cost = {52.5 rows, 77.5 cpu, 50.0 io, 0.0 network, 0.0 memory}, id 
> = 112988
> 00-01  Project(n_nationkey=[$1]) : rowType = RecordType(ANY n_nationkey): 
> rowcount = 25.0, cumulative cost = {50.0 rows, 75.0 cpu, 50.0 io, 0.0 
> network, 0.0 memory}, id = 112987
> 00-02Scan(table=[[dfs, tmp, nation]], groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:///tmp/nation]], 
> selectionRoot=maprfs:/tmp/nation, numFiles=1, numRowGroups=1, 
> usedMetadataFile=false, columns=[`n_regionkey`, `n_nationkey`]]]) : rowType = 
> RecordType(ANY n_regionkey, ANY n_nationkey): rowcount = 25.0, cumulative 
> cost = {25.0 rows, 50.0 cpu, 50.0 io, 0.0 network, 0.0 memory}, id = 112986
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6991) Kerberos ticket is being dumped in the log if log level is "debug" for stdout

2019-01-22 Thread Sorabh Hamirwasia (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749039#comment-16749039
 ] 

Sorabh Hamirwasia commented on DRILL-6991:
--

Why is this a problem ? TGT is encrypted with TGS secret key so only TGS can 
decrypt it.

> Kerberos ticket is being dumped in the log if log level is "debug" for stdout 
> --
>
> Key: DRILL-6991
> URL: https://issues.apache.org/jira/browse/DRILL-6991
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.15.0
>Reporter: Anton Gozhiy
>Priority: Major
>
> *Prerequisites:*
>  # Drill is installed on cluster with Kerberos security
>  # Into conf/logback.xml, set the following log level:
> {code:xml}
>   
> 
> 
>   
> {code}
> *Steps:*
> # Start Drill
> # Connect using sqlline using the following string:
> {noformat}
> bin/sqlline -u "jdbc:drill:zk=;principal="
> {noformat}
> *Expected result:*
> No sensitive information should be displayed
> *Actual result:*
> Kerberos  ticket and session key are being dumped into console output:
> {noformat}
> 14:35:38.806 [TGT Renewer for mapr/node1.cluster.com@NODE1] DEBUG 
> o.a.h.security.UserGroupInformation - Found tgt Ticket (hex) = 
> : 61 82 01 3D 30 82 01 39   A0 03 02 01 05 A1 07 1B  a..=0..9
> 0010: 05 4E 4F 44 45 31 A2 1A   30 18 A0 03 02 01 02 A1  .NODE1..0...
> 0020: 11 30 0F 1B 06 6B 72 62   74 67 74 1B 05 4E 4F 44  .0...krbtgt..NOD
> 0030: 45 31 A3 82 01 0B 30 82   01 07 A0 03 02 01 12 A1  E10.
> 0040: 03 02 01 01 A2 81 FA 04   81 F7 03 8D A9 FA 7D 89  
> 0050: 1B DF 37 B7 4D E6 6C 99   3E 8F FA 48 D9 9A 79 F3  ..7.M.l.>..H..y.
> 0060: 92 34 7F BF 67 1E 77 4A   2F C9 AF 82 93 4E 46 1D  .4..g.wJ/NF.
> 0070: 41 74 B0 AF 41 A8 8B 02   71 83 CC 14 51 72 60 EE  At..A...q...Qr`.
> 0080: 29 67 14 F0 A6 33 63 07   41 AA 8D DC 7B 5B 41 F3  )g...3c.A[A.
> 0090: 83 48 8B 2A 0B 4D 6D 57   9A 6E CF 6B DC 0B C0 D1  .H.*.MmW.n.k
> 00A0: 83 BB 27 40 88 7E 9F 2B   D1 FD A8 6A E1 BF F6 CC  ..'@...+...j
> 00B0: 0E 0C FB 93 5D 69 9A 8B   11 88 0C F2 7C E1 FD 04  ]i..
> 00C0: F5 AB 66 0C A4 A4 7B 30   D1 7F F1 2D D6 A1 52 D1  ..f0...-..R.
> 00D0: 79 59 F2 06 CB 65 FB 73   63 1D 5B E9 4F 28 73 EB  yY...e.sc.[.O(s.
> 00E0: 72 7F 04 46 34 56 F4 40   6C C0 2C 39 C0 5B C6 25  r..F4V.@l.,9.[.%
> 00F0: ED EF 64 07 CE ED 35 9D   D7 91 6C 8F C9 CE 16 F5  ..d...5...l.
> 0100: CA 5E 6F DE 08 D2 68 30   C7 03 97 E7 C0 FF D9 52  .^o...h0...R
> 0110: F8 1D 2F DB 63 6D 12 4A   CD 60 AD D0 BA FA 4B CF  ../.cm.J.`K.
> 0120: 2C B9 8C CA 5A E6 EC 10   5A 0A 1F 84 B0 80 BD 39  ,...Z...Z..9
> 0130: 42 2C 33 EB C0 AA 0D 44   F0 F4 E9 87 24 43 BB 9A  B,3D$C..
> 0140: 52 R
> Client Principal = mapr/node1.cluster.com@NODE1
> Server Principal = krbtgt/NODE1@NODE1
> Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
> : 50 DA D1 D7 91 D3 64 BE   45 7B D8 02 25 81 18 25  P.d.E...%..%
> 0010: DA 59 4F BA 76 67 BB 39   9C F7 17 46 A7 C5 00 E2  .YO.vg.9...F
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6993) VARBINARY length is ignored on cast

2019-01-22 Thread Bohdan Kazydub (JIRA)

Bohdan Kazydub created DRILL-6993:
-

 Summary: VARBINARY length is ignored on cast
 Key: DRILL-6993
 URL: https://issues.apache.org/jira/browse/DRILL-6993
 Project: Apache Drill
  Issue Type: Bug
Reporter: Bohdan Kazydub
Assignee: Bohdan Kazydub


{{VARBINARY}} precision is not set when casting to {{VARBINARY}} with specified 
length.
For example, test case 
{code}
  String query = "select cast(r_name as varbinary(31)) as vb from 
cp.`tpch/region.parquet`;"
  MaterializedField field = new ColumnBuilder("vb", 
TypeProtos.MinorType.VARBINARY)
  .setMode(TypeProtos.DataMode.OPTIONAL)
  .setWidth(31)
  .build();
  BatchSchema expectedSchema = new SchemaBuilder()
  .add(field)
  .build();

  // Validate schema
  testBuilder()
  .sqlQuery(q)
  .schemaBaseLine(expectedSchema)
  .go();
{code}
will fail with
{code}
java.lang.Exception: Schema path or type mismatch for column #0:
Expected schema path: vb
Actual   schema path: vb
Expected type: MajorType[minor_type: VARBINARY mode: OPTIONAL precision: 31 
scale: 0]
Actual   type: MajorType[minor_type: VARBINARY mode: OPTIONAL]
{code}
while for other types, like {{VARCHAR}}, it seems to work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6992) Support column histogram statistics

2019-01-22 Thread Aman Sinha (JIRA)

Aman Sinha created DRILL-6992:
-

 Summary: Support column histogram statistics
 Key: DRILL-6992
 URL: https://issues.apache.org/jira/browse/DRILL-6992
 Project: Apache Drill
  Issue Type: New Feature
  Components: Query Planning  Optimization
Affects Versions: 1.15.0
Reporter: Aman Sinha
Assignee: Aman Sinha


As a follow-up to [DRILL-1328|https://issues.apache.org/jira/browse/DRILL-1328] 
which is adding NDV (num distinct values) support and creating the framework 
for statistics, we also need Histograms.   These are needed  for range 
predicates selectivity estimation as well as equality predicates when there is 
non-uniform distribution of data.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6950) Row set-based scan framework

2019-01-22 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6950:

Reviewer: Arina Ielchiieva

> Row set-based scan framework
> 
>
> Key: DRILL-6950
> URL: https://issues.apache.org/jira/browse/DRILL-6950
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.15.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.16.0
>
>
> Next step in the ongoing "Result Set Loader" saga. Merge the enhanced scan 
> operator framework Into master. Includes:
> * Projection add-ons for the "columns" column and file metadata columns.
> * Mechanisms to orchestrate the result set loader, projection framework, scan 
> node and reader code.
> * Extensive unit tests.
> Given the number of test files to review, pushed the following to the next PR:
> * Extension to the "Easy" reader framework to handle the new structure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6977) Improve Hive tests configuration

2019-01-22 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6977:

Affects Version/s: 1.15.0

> Improve Hive tests configuration
> 
>
> Key: DRILL-6977
> URL: https://issues.apache.org/jira/browse/DRILL-6977
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Class HiveTestDataGenerator is responsible for initialization of hive 
> metadata service and configuration of hive storage plugin for tested 
> drillbit. Originally it was supposed to be initialized once before all tests 
> in hive module, but actually it's initialized for every test class. And such 
> initialization takes a lot of time, so it's worth to spend some time to 
> accelerate hive tests.
> This task has two main aims: 
>  # Use HiveTestDataGenerator once for all test classes 
>  # Provide flexible configuration of Hive tests that can be used with 
> ClusterFixture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6977) Improve Hive tests configuration

2019-01-22 Thread Arina Ielchiieva (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva updated DRILL-6977:

Labels: ready-to-commit  (was: )

> Improve Hive tests configuration
> 
>
> Key: DRILL-6977
> URL: https://issues.apache.org/jira/browse/DRILL-6977
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Tools, Build  Test
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
>  Labels: ready-to-commit
> Fix For: 1.16.0
>
>
> Class HiveTestDataGenerator is responsible for initialization of hive 
> metadata service and configuration of hive storage plugin for tested 
> drillbit. Originally it was supposed to be initialized once before all tests 
> in hive module, but actually it's initialized for every test class. And such 
> initialization takes a lot of time, so it's worth to spend some time to 
> accelerate hive tests.
> This task has two main aims: 
>  # Use HiveTestDataGenerator once for all test classes 
>  # Provide flexible configuration of Hive tests that can be used with 
> ClusterFixture for autonomic(not bounded to HiveTestBase) test classes 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6862) Update Calcite to 1.18.0

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6862:
---
Fix Version/s: 1.16.0

> Update Calcite to 1.18.0 
> -
>
> Key: DRILL-6862
> URL: https://issues.apache.org/jira/browse/DRILL-6862
> Project: Apache Drill
>  Issue Type: Task
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> After ongoing release of the new Calcite version we will change our 
> dependency.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (DRILL-6991) Kerberos ticket is being dumped in the log if log level is "debug" for stdout

2019-01-22 Thread Anton Gozhiy (JIRA)

Anton Gozhiy created DRILL-6991:
---

 Summary: Kerberos ticket is being dumped in the log if log level 
is "debug" for stdout 
 Key: DRILL-6991
 URL: https://issues.apache.org/jira/browse/DRILL-6991
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.15.0
Reporter: Anton Gozhiy


*Prerequisites:*
 # Drill is installed on cluster with Kerberos security
 # Into conf/logback.xml, set the following log level:
{code:xml}
  


  
{code}

*Steps:*
# Start Drill
# Connect using sqlline using the following string:
{noformat}
bin/sqlline -u "jdbc:drill:zk=;principal="
{noformat}

*Expected result:*
No sensitive information should be displayed

*Actual result:*
Kerberos  ticket and session key are being dumped into console output:
{noformat}
14:35:38.806 [TGT Renewer for mapr/node1.cluster.com@NODE1] DEBUG 
o.a.h.security.UserGroupInformation - Found tgt Ticket (hex) = 
: 61 82 01 3D 30 82 01 39   A0 03 02 01 05 A1 07 1B  a..=0..9
0010: 05 4E 4F 44 45 31 A2 1A   30 18 A0 03 02 01 02 A1  .NODE1..0...
0020: 11 30 0F 1B 06 6B 72 62   74 67 74 1B 05 4E 4F 44  .0...krbtgt..NOD
0030: 45 31 A3 82 01 0B 30 82   01 07 A0 03 02 01 12 A1  E10.
0040: 03 02 01 01 A2 81 FA 04   81 F7 03 8D A9 FA 7D 89  
0050: 1B DF 37 B7 4D E6 6C 99   3E 8F FA 48 D9 9A 79 F3  ..7.M.l.>..H..y.
0060: 92 34 7F BF 67 1E 77 4A   2F C9 AF 82 93 4E 46 1D  .4..g.wJ/NF.
0070: 41 74 B0 AF 41 A8 8B 02   71 83 CC 14 51 72 60 EE  At..A...q...Qr`.
0080: 29 67 14 F0 A6 33 63 07   41 AA 8D DC 7B 5B 41 F3  )g...3c.A[A.
0090: 83 48 8B 2A 0B 4D 6D 57   9A 6E CF 6B DC 0B C0 D1  .H.*.MmW.n.k
00A0: 83 BB 27 40 88 7E 9F 2B   D1 FD A8 6A E1 BF F6 CC  ..'@...+...j
00B0: 0E 0C FB 93 5D 69 9A 8B   11 88 0C F2 7C E1 FD 04  ]i..
00C0: F5 AB 66 0C A4 A4 7B 30   D1 7F F1 2D D6 A1 52 D1  ..f0...-..R.
00D0: 79 59 F2 06 CB 65 FB 73   63 1D 5B E9 4F 28 73 EB  yY...e.sc.[.O(s.
00E0: 72 7F 04 46 34 56 F4 40   6C C0 2C 39 C0 5B C6 25  r..F4V.@l.,9.[.%
00F0: ED EF 64 07 CE ED 35 9D   D7 91 6C 8F C9 CE 16 F5  ..d...5...l.
0100: CA 5E 6F DE 08 D2 68 30   C7 03 97 E7 C0 FF D9 52  .^o...h0...R
0110: F8 1D 2F DB 63 6D 12 4A   CD 60 AD D0 BA FA 4B CF  ../.cm.J.`K.
0120: 2C B9 8C CA 5A E6 EC 10   5A 0A 1F 84 B0 80 BD 39  ,...Z...Z..9
0130: 42 2C 33 EB C0 AA 0D 44   F0 F4 E9 87 24 43 BB 9A  B,3D$C..
0140: 52 R

Client Principal = mapr/node1.cluster.com@NODE1
Server Principal = krbtgt/NODE1@NODE1
Session Key = EncryptionKey: keyType=18 keyBytes (hex dump)=
: 50 DA D1 D7 91 D3 64 BE   45 7B D8 02 25 81 18 25  P.d.E...%..%
0010: DA 59 4F BA 76 67 BB 39   9C F7 17 46 A7 C5 00 E2  .YO.vg.9...F
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6862) Update Calcite to 1.18.0

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6862:
---
Affects Version/s: 1.15.0

> Update Calcite to 1.18.0 
> -
>
> Key: DRILL-6862
> URL: https://issues.apache.org/jira/browse/DRILL-6862
> Project: Apache Drill
>  Issue Type: Task
>Affects Versions: 1.15.0
>Reporter: Igor Guzenko
>Assignee: Igor Guzenko
>Priority: Major
> Fix For: 1.16.0
>
>
> After ongoing release of the new Calcite version we will change our 
> dependency.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (DRILL-6980) --run regression - error when empty line or command at end of the file

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748033#comment-16748033
 ] 

Volodymyr Vysotskyi edited comment on DRILL-6980 at 1/22/19 10:22 AM:
--

[~benj641], the problem is in SqlLine: 
[https://github.com/julianhyde/sqlline/issues/232]. It was fixed there and it 
will be fixed for Drill after upgrade to SqlLine 1.7.0


was (Author: vvysotskyi):
[~benj641], the problem is in SqlLine: 
[https://github.com/julianhyde/sqlline/issues/232.] It was fixed there and it 
will be fixed for Drill after upgrade to SqlLine 1.7.0

> --run regression - error when empty line or command at end of the file
> --
>
> Key: DRILL-6980
> URL: https://issues.apache.org/jira/browse/DRILL-6980
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Minor
>
> When using --run like
> {code:java}
> bin/drill-embedded --run="myfile.req"
> {code}
> If "myfile.req" contains extra lines (empty or comment) after the last SQL 
> DRILL request, an error appear.
> {code:java}
> Error: PARSE ERROR: Encountered "" at line 1, column 4.
> {code}
> Note that empty lines or comment lines in the middle of the file (ie between 
> 2 DRILL requests or at the beginning of the file) don't make any problem.
> This problem appeared in 1.15.0 did not exists in 1.14.0
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6987) sqlline - simple semicolon ";" with no command produce error

2019-01-22 Thread benj (JIRA)



[ 
https://issues.apache.org/jira/browse/DRILL-6987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748571#comment-16748571
 ] 

benj commented on DRILL-6987:
-

OK. So, for monitoring, I created a related issue on 
https://github.com/julianhyde/sqlline/issues/267

> sqlline - simple semicolon ";" with no command produce error
> 
>
> Key: DRILL-6987
> URL: https://issues.apache.org/jira/browse/DRILL-6987
> Project: Apache Drill
>  Issue Type: Bug
>  Components: SQL Parser
>Affects Versions: 1.15.0
>Reporter: benj
>Priority: Minor
>
> In sqlline, a single semicolon (or space(s) followed by semicolon) - "*[ 
> ]*;*" - produce an error
> {code:java}
> ;
> Error: PARSE ERROR: Encountered "" at line 0, column 0.
> {code}
> It's not necessary a real bug, but I think it can produce "too much" error 
> for nothing.
> Don't need/want an error for an empty command.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (DRILL-6969) Inconsistent results when reading MaprDB JSON tables using hive plugin when native reader is enabled

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6969:
---
Description: 
Steps to reproduce:
0. Set PST timezone.
1. Create the table in MaprDB shell:
{code:java}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value '{"_id":"1","datestring":"2018-01-01 
12:12:12.123","datetimestamp":{"$date":"2018-01-01T20:12:12.123Z"}}'
insert /tmp/testtimestamp --value '{"_id":"2","datestring":"-12-31 
23:59:59.999","datetimestamp":{"$date":"1-01-01T07:59:59.999Z"}}'
{code}
2. Create a hive table:
{code:sql}
create external table `testtimestamp` (`_id` string, datestring string, 
datetimestamp timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES ( 'maprdb.column.id'='_id', 
'maprdb.table.name'='/tmp/testtimestamp');
{code}
3. Disable native reader and run the query on the table from Drill using hive 
plugin:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=false;
select * from hive.testtimestamp;
{code}
It returns:
{noformat}
+--+--+--+
| _id  |datestring|  datetimestamp   |
+--+--+--+
| 1| 2018-01-01 12:12:12.123  | 2018-01-01 12:12:12.123  |
| 2| -12-31 23:59:59.999  | -12-31 23:59:59.999  |
+--+--+--+
{noformat}
4. Enable native reader and run the query on the same table:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true;
select * from hive.testtimestamp;
{code}
It returns:
{noformat}
+--+--+---+
| _id  |datestring|   datetimestamp   |
+--+--+---+
| 1| 2018-01-01 12:12:12.123  | 2018-01-01 20:12:12.123   |
| 2| -12-31 23:59:59.999  | 1-01-01 07:59:59.999  |
+--+--+---+
{noformat}
h2. For documentation:

Added the following Mapr-DB Format Setting:
||Option||Description||Value||
|readTimestampWithZoneOffset|When enabled, Drill converts timestamp values read 
form MapR Database from UTC to local timezone. Disabled by default.|true\|false|

Added the following configuration option:
||Name||Default||Description||
|store.hive.maprdb_json.read_timestamp_with_timezone_offset|FALSE|Enables Drill 
to read timestamp values with timezone offset when hive plugin is used and 
Drill native MaprDB JSON reader usage is enabled. (Drill 1.16+)|

  was:
Steps to reproduce:
0. Set PST timezone.
1. Create the table in MaprDB shell:
{code}
create /tmp/testtimestamp
insert /tmp/testtimestamp --value '{"_id":"1","datestring":"2018-01-01 
12:12:12.123","datetimestamp":{"$date":"2018-01-01T20:12:12.123Z"}}'
insert /tmp/testtimestamp --value '{"_id":"2","datestring":"-12-31 
23:59:59.999","datetimestamp":{"$date":"1-01-01T07:59:59.999Z"}}'
{code}
2. Create a hive table:
{code:sql}
create external table `testtimestamp` (`_id` string, datestring string, 
datetimestamp timestamp)
ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'
STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
TBLPROPERTIES ( 'maprdb.column.id'='_id', 
'maprdb.table.name'='/tmp/testtimestamp');
{code}
3. Disable native reader and run the query on the table from Drill using hive 
plugin:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=false;
select * from hive.testtimestamp;
{code}
It returns:
{noformat}
+--+--+--+
| _id  |datestring|  datetimestamp   |
+--+--+--+
| 1| 2018-01-01 12:12:12.123  | 2018-01-01 12:12:12.123  |
| 2| -12-31 23:59:59.999  | -12-31 23:59:59.999  |
+--+--+--+
{noformat}
4. Enable native reader and run the query on the same table:
{code:sql}
alter session set store.hive.maprdb_json.optimize_scan_with_native_reader=true;
select * from hive.testtimestamp;
{code}
It returns:
{noformat}
+--+--+---+
| _id  |datestring|   datetimestamp   |
+--+--+---+
| 1| 2018-01-01 12:12:12.123  | 2018-01-01 20:12:12.123   |
| 2| -12-31 23:59:59.999  | 1-01-01 07:59:59.999  |
+--+--+---+
{noformat}


> Inconsistent results when reading MaprDB JSON tables using hive plugin when 
> native reader is enabled
>

[jira] [Updated] (DRILL-6969) Inconsistent results when reading MaprDB JSON tables using hive plugin when native reader is enabled

2019-01-22 Thread Volodymyr Vysotskyi (JIRA)



 [ 
https://issues.apache.org/jira/browse/DRILL-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Volodymyr Vysotskyi updated DRILL-6969:
---
Labels: doc-impacting ready-to-commit  (was: ready-to-commit)

> Inconsistent results when reading MaprDB JSON tables using hive plugin when 
> native reader is enabled
> 
>
> Key: DRILL-6969
> URL: https://issues.apache.org/jira/browse/DRILL-6969
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.14.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: doc-impacting, ready-to-commit
> Fix For: 1.16.0
>
>
> Steps to reproduce:
> 0. Set PST timezone.
> 1. Create the table in MaprDB shell:
> {code:java}
> create /tmp/testtimestamp
> insert /tmp/testtimestamp --value '{"_id":"1","datestring":"2018-01-01 
> 12:12:12.123","datetimestamp":{"$date":"2018-01-01T20:12:12.123Z"}}'
> insert /tmp/testtimestamp --value '{"_id":"2","datestring":"-12-31 
> 23:59:59.999","datetimestamp":{"$date":"1-01-01T07:59:59.999Z"}}'
> {code}
> 2. Create a hive table:
> {code:sql}
> create external table `testtimestamp` (`_id` string, datestring string, 
> datetimestamp timestamp)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.maprdb.json.serde.MapRDBSerDe'
> STORED BY 'org.apache.hadoop.hive.maprdb.json.MapRDBJsonStorageHandler'
> TBLPROPERTIES ( 'maprdb.column.id'='_id', 
> 'maprdb.table.name'='/tmp/testtimestamp');
> {code}
> 3. Disable native reader and run the query on the table from Drill using hive 
> plugin:
> {code:sql}
> alter session set 
> store.hive.maprdb_json.optimize_scan_with_native_reader=false;
> select * from hive.testtimestamp;
> {code}
> It returns:
> {noformat}
> +--+--+--+
> | _id  |datestring|  datetimestamp   |
> +--+--+--+
> | 1| 2018-01-01 12:12:12.123  | 2018-01-01 12:12:12.123  |
> | 2| -12-31 23:59:59.999  | -12-31 23:59:59.999  |
> +--+--+--+
> {noformat}
> 4. Enable native reader and run the query on the same table:
> {code:sql}
> alter session set 
> store.hive.maprdb_json.optimize_scan_with_native_reader=true;
> select * from hive.testtimestamp;
> {code}
> It returns:
> {noformat}
> +--+--+---+
> | _id  |datestring|   datetimestamp   |
> +--+--+---+
> | 1| 2018-01-01 12:12:12.123  | 2018-01-01 20:12:12.123   |
> | 2| -12-31 23:59:59.999  | 1-01-01 07:59:59.999  |
> +--+--+---+
> {noformat}
> h2. For documentation:
> Added the following Mapr-DB Format Setting:
> ||Option||Description||Value||
> |readTimestampWithZoneOffset|When enabled, Drill converts timestamp values 
> read form MapR Database from UTC to local timezone. Disabled by 
> default.|true\|false|
> Added the following configuration option:
> ||Name||Default||Description||
> |store.hive.maprdb_json.read_timestamp_with_timezone_offset|FALSE|Enables 
> Drill to read timestamp values with timezone offset when hive plugin is used 
> and Drill native MaprDB JSON reader usage is enabled. (Drill 1.16+)|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

[jira] [Commented] (DRILL-6827) Apache Drill 1.14 on a kerberized Cloudera cluster (CDH 5.14).

[jira] [Commented] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

[jira] [Commented] (DRILL-5807) ambiguous error

[jira] [Commented] (DRILL-6982) Affected rows count is not returned by Drill if return_result_set_for_ddl is false

[jira] [Updated] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

[jira] [Created] (DRILL-6994) TIMESTAMP type DOB column in Spark parquet is treated as VARBINARY in Drill

[jira] [Updated] (DRILL-6910) A filtering column remains in scan when filter pruning happens

[jira] [Updated] (DRILL-6910) A filtering column remains in scan when filter pruning happens

[jira] [Assigned] (DRILL-6910) A filtering column remains in scan when filter pruning happens

[jira] [Commented] (DRILL-6991) Kerberos ticket is being dumped in the log if log level is "debug" for stdout

[jira] [Created] (DRILL-6993) VARBINARY length is ignored on cast

[jira] [Created] (DRILL-6992) Support column histogram statistics

[jira] [Updated] (DRILL-6950) Row set-based scan framework

[jira] [Updated] (DRILL-6977) Improve Hive tests configuration

[jira] [Updated] (DRILL-6977) Improve Hive tests configuration

[jira] [Updated] (DRILL-6862) Update Calcite to 1.18.0

[jira] [Created] (DRILL-6991) Kerberos ticket is being dumped in the log if log level is "debug" for stdout

[jira] [Updated] (DRILL-6862) Update Calcite to 1.18.0

[jira] [Comment Edited] (DRILL-6980) --run regression - error when empty line or command at end of the file

[jira] [Commented] (DRILL-6987) sqlline - simple semicolon ";" with no command produce error

[jira] [Updated] (DRILL-6969) Inconsistent results when reading MaprDB JSON tables using hive plugin when native reader is enabled

[jira] [Updated] (DRILL-6969) Inconsistent results when reading MaprDB JSON tables using hive plugin when native reader is enabled

23 matches

Site Navigation

Mail list logo

Footer information