from:"Sujith \(JIRA\)"

[jira] [Commented] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.

2019-03-03 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782840#comment-16782840
 ] 

Sujith commented on SPARK-27036:


It seems to be the problem area is   BroadcastExchangeExec  in driver where  as 
part of Future a particular job will be fired and collected data will be 
broadcasted. 

The main problem is system will submit the job and its respective stage/tasks 
through DAGScheduler,  where the scheduler thread will schedule the respective 
events , In BroadcastExchangeExec when future time out happens respective 
exception will thrown but the jobs/task which is  scheduled by  the  
DAGScheduler as part of the action called in future will not be cancelled, I 
think we shall cancel the respective job to avoid  running the same in  
background even after Future time out exception, this can help to terminate the 
job promptly when TimeOutException happens, this will also save the additional 
resources getting utilized even after timeout exception thrown from driver. 

I want to give an attempt to handle this issue, Any comments suggestions are 
welcome.

cc [~sro...@scient.com] [~b...@cloudera.com] [~hvanhovell]

> Even Broadcast thread is timed out, BroadCast Job is not aborted.
> -
>
> Key: SPARK-27036
> URL: https://issues.apache.org/jira/browse/SPARK-27036
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2019-03-04-00-38-52-401.png, 
> image-2019-03-04-00-39-12-210.png, image-2019-03-04-00-39-38-779.png
>
>
> During broadcast table job is execution if broadcast timeout 
> (spark.sql.broadcastTimeout) happens ,broadcast Job still continue till 
> completion whereas it should abort on broadcast timeout.
> Exception is thrown in console  but Spark Job is still continue.
>  
> !image-2019-03-04-00-39-38-779.png!
> !image-2019-03-04-00-39-12-210.png!
>  
>  wait for some time
> !image-2019-03-04-00-38-52-401.png!
> !image-2019-03-04-00-34-47-884.png!
>  
> How to Reproduce Issue
> Option1 using SQL:- 
>  create Table t1(Big Table,1M Records)
>  val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> 
> ("name_"+x,x%3,x))
>  val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
>  df.write.csv("D:/data/par1/t4");
>  spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')");
> create Table t2(Small Table,100K records)
>  val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> 
> ("name_"+x,x%3,x))
>  val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as 
> c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as 
> c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as 
> c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as 
> c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as 
> c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30")
>  df.write.csv("D:/data/par1/t4");
>  spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')");
> spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false)
>  spark.sql("set spark.sql.broadcastTimeout=2").show(false)
>  Run Below Query 
>  spark.sql("create table s using parquet as select t1.* from csv_2 as 
> t1,csv_1 as t2 where t1._c3=t2._c3")
> Option 2:- Use External DataSource and Add Delay in the #buildScan. and use 
> datasource for query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26969) [Spark] Using ODBC not able to see the data in table when datatype is decimal

2019-02-22 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774994#comment-16774994
 ] 

Sujith commented on SPARK-26969:


i will further analyze the issue and raise a PR if required. thanks

> [Spark] Using ODBC not able to see the data in table when datatype is decimal
> -
>
> Key: SPARK-26969
> URL: https://issues.apache.org/jira/browse/SPARK-26969
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.4.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> # Using odbc rpm file install odbc 
>  # connect to odbc using isql -v spark2xsingle
>  # SQL> create table t1_t(id decimal(15,2));
>  # SQL> insert into t1_t values(15);
>  # 
> SQL> select * from t1_t;
> +-+
> | id |
> +-+
> +-+  Actual output is empty
> Note: When creating table of int data type select is giving result as below
> SQL> create table test_t1(id int);
> SQL> insert into test_t1 values(10);
> SQL> select * from test_t1;
> ++
> | id |
> ++
> | 10 |
> ++
> Needs to handle for decimal case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2019-02-20 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772921#comment-16772921
 ] 

Sujith commented on SPARK-22601:


*[gatorsmile|https://github.com/gatorsmile] [~srowen]  please assign this JIRA 
to me as already this PR  is been merged. Thanks*

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
> Fix For: 2.2.1
>
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-05 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761086#comment-16761086
 ] 

Sujith edited comment on SPARK-26821 at 2/5/19 6:27 PM:


Yeah with spaces it will work fine,.shall we document this behavior? ,  will 
try to check the behavior in couple of other systems also.


was (Author: s71955):
Yeah with spaces it will work fine, will try to check the behavior in couple of 
other systems also. 

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-05 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761086#comment-16761086
 ] 

Sujith commented on SPARK-26821:


Yeah with spaces it will work fine, will try to check the behavior in couple of 
other systems also. 

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-04 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760063#comment-16760063
 ] 

Sujith commented on SPARK-26821:


bit tricky to handle this scenario eventhough.

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-04 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760061#comment-16760061
 ] 

Sujith commented on SPARK-26821:


yes sean, but same i tested with MYSQL its giving me a result. not sure how 
they are handling internally.

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-04 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382
 ] 

Sujith edited comment on SPARK-26821 at 2/4/19 5:38 PM:


cc [~dongjoon] [~vinodkc] [~srowen]


was (Author: s71955):
cc [~dongjoon] [~vinodkc]

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-03 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759381#comment-16759381
 ] 

Sujith commented on SPARK-26821:


As per the initial analysis, this phenomenon is happening because the actual 
char data type length is 5 where as we are trying to insert a data with length 
2, since its a char data type the system will pad the remaining part of the 
array block with 'space'. now when we try to apply a filter, the system will 
try to compare the predicate value with the actual table data which contains 
the space char like 'ds' ==  'ds   ' which leads to wrong result.

 

I am trying to analyze more on this issue please let me know for any 
suggestions or guidance. thanks

 

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-03 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382
 ] 

Sujith commented on SPARK-26821:


cc [~dongjoon]

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-03 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382
 ] 

Sujith edited comment on SPARK-26821 at 2/3/19 11:44 AM:
-

cc [~dongjoon] [~vinodkc]


was (Author: s71955):
cc [~dongjoon]

> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-03 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26821:
---
Description: 
creates a table with a char type field, While inserting data to char data type 
column, if the data string length is less than the specified datatype length, 
spark2x will not process filter query properly leading to incorrect result .

0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
char(5));
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (0.894 seconds)
 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
values(232,'ds');
 +--+-+
|Result|

+--+-+
 +--+-+
 No rows selected (1.815 seconds)
 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
 +--+--++--
|id|name|

+--+--++--
 +--+--++--

 

The above query will not give any result.

  was:
creates a table with a char type field, While inserting data to char data type 
column, if the data string length  is less than the specified datatype length, 
spark2x will not process filter query properly leading to incorrect result .

0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
char(5));
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.894 seconds)
0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
values(232,'ds');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (1.815 seconds)
0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
+-+---+--+
| id  | name  |
+-+---+--+
+-+---+--+


> filters not working with char datatype when querying against hive table
> ---
>
> Key: SPARK-26821
> URL: https://issues.apache.org/jira/browse/SPARK-26821
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sujith
>Priority: Major
>
> creates a table with a char type field, While inserting data to char data 
> type column, if the data string length is less than the specified datatype 
> length, spark2x will not process filter query properly leading to incorrect 
> result .
> 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
> char(5));
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (0.894 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
> values(232,'ds');
>  +--+-+
> |Result|
> +--+-+
>  +--+-+
>  No rows selected (1.815 seconds)
>  0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
>  +--+--++--
> |id|name|
> +--+--++--
>  +--+--++--
>  
> The above query will not give any result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26821) filters not working with char datatype when querying against hive table

2019-02-03 Thread Sujith (JIRA)

Sujith created SPARK-26821:
--

 Summary: filters not working with char datatype when querying 
against hive table
 Key: SPARK-26821
 URL: https://issues.apache.org/jira/browse/SPARK-26821
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Sujith


creates a table with a char type field, While inserting data to char data type 
column, if the data string length  is less than the specified datatype length, 
spark2x will not process filter query properly leading to incorrect result .

0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name 
char(5));
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.894 seconds)
0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj 
values(232,'ds');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (1.815 seconds)
0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds';
+-+---+--+
| id  | name  |
+-+---+--+
+-+---+--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2019-01-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752044#comment-16752044
 ] 

Sujith edited comment on SPARK-9 at 1/25/19 8:53 AM:
-

@[~yuvaldeg]  May i know where i can find PR related to new 
[SparkRDMA|https://github.com/Mellanox/SparkRDMA] implementation. just wanted 
to evaluate it further, quite interesting feature 


was (Author: s71955):
@[~yuvaldeg]  May i know where i can find PR related to new 
[SparkRDMA|https://github.com/Mellanox/SparkRDMA] implementation. We want to 
evaluate it further, quite interesting feature 

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
>Priority: Major
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2019-01-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752044#comment-16752044
 ] 

Sujith commented on SPARK-9:


@[~yuvaldeg]  May i know where i can find PR related to new 
[SparkRDMA|https://github.com/Mellanox/SparkRDMA] implementation. We want to 
evaluate it further, quite interesting feature 

> SPIP: RDMA Accelerated Shuffle Engine
> -
>
> Key: SPARK-9
> URL: https://issues.apache.org/jira/browse/SPARK-9
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Yuval Degani
>Priority: Major
> Attachments: 
> SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf
>
>
> An RDMA-accelerated shuffle engine can provide enormous performance benefits 
> to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin 
> open-source project ([https://github.com/Mellanox/SparkRDMA]).
> Using RDMA for shuffle improves CPU utilization significantly and reduces I/O 
> processing overhead by bypassing the kernel and networking stack as well as 
> avoiding memory copies entirely. Those valuable CPU cycles are then consumed 
> directly by the actual Spark workloads, and help reducing the job runtime 
> significantly. 
> This performance gain is demonstrated with both industry standard HiBench 
> TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive 
> customer applications. 
> SparkRDMA will be presented at Spark Summit 2017 in Dublin 
> ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]).
> Please see attached proposal document for more information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732299#comment-16732299
 ] 

Sujith commented on SPARK-26432:


Test description is been updated. let me know for any suggestions or input. 
thnks all.

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  Test steps:
> Steps to test Spark-Hbase connection
> 1. Create 2 tables in hbase shell
>  >Launch hbase shell
>  >Enter commands to create tables and load data
>  create 'table1','cf'
>  put 'table1','row1','cf:cid','20'
> create 'table2','cf'
>  put 'table2','row1','cf:cid','30'
>  
>  >Show values command
>  get 'table1','row1','cf:cid' will diplay value as 20
>  get 'table2','row1','cf:cid' will diplay value as 30
>  
>  
> 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit
> spark-submit --master yarn-cluster --class 
> com.mrs.example.spark.SparkHbasetoHbase --conf 
> "spark.yarn.security.credentials.hbase.enabled"="true" --conf 
> "spark.security.credentials.hbase.enabled"="true" --keytab 
> /opt/client/user.keytab --principal sen testSpark.jar
> The SparkHbasetoHbase class will update the value of table2 with sum of 
> values of table1 & table2.
> table2 = table1+table2
>  
> 3.Verify the result in hbase shell
> Expected Result: The value of table2 should be 50.
> get 'table1','row1','cf:cid'  will diplay value as 50
> Actual Result : Not updating the value as an error will be thrown when spark 
> tries to connect with hbase service.
> Attached the snapshot of error logs below for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Description: 
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.

 Test steps:

Steps to test Spark-Hbase connection

1. Create 2 tables in hbase shell
 >Launch hbase shell
 >Enter commands to create tables and load data
 create 'table1','cf'
 put 'table1','row1','cf:cid','20'

create 'table2','cf'
 put 'table2','row1','cf:cid','30'
 
 >Show values command
 get 'table1','row1','cf:cid' will diplay value as 20
 get 'table2','row1','cf:cid' will diplay value as 30
 
 
2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit

spark-submit --master yarn-cluster --class 
com.mrs.example.spark.SparkHbasetoHbase --conf 
"spark.yarn.security.credentials.hbase.enabled"="true" --conf 
"spark.security.credentials.hbase.enabled"="true" --keytab 
/opt/client/user.keytab --principal sen testSpark.jar

The SparkHbasetoHbase class will update the value of table2 with sum of values 
of table1 & table2.

table2 = table1+table2

 

3.Verify the result in hbase shell

Expected Result: The value of table2 should be 50.

get 'table1','row1','cf:cid'  will diplay value as 50

Actual Result : Not updating the value as an error will be thrown when spark 
tries to connect with hbase service.

Attached the snapshot of error logs below for more details

  was:
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.

 

Attached the snapshot of error logs


> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  Test steps:
> Steps to test Spark-Hbase connection
> 1. Create 2 tables in hbase shell
>  >Launch hbase shell
>  >Enter commands to create tables and load data
>  create 'table1','cf'
>  put 'table1','row1','cf:cid','20'
> create 'table2','cf'
>  put 'table2','row1','cf:cid','30'
>  
>  >Show values command
>  get 'table1','row1','cf:cid' will diplay value as 20
>  get 'table2','row1','cf:cid' will diplay value as 30
>  
>  
> 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit
> spark-submit --master yarn-cluster --class 
> com.mrs.example.spark.SparkHbasetoHbase --conf 
> "spark.yarn.security.credentials.hbase.enabled"="true" --conf 
> "spark.security.credentials.hbase.enabled"="true" --keytab 
> /opt/client/user.keytab --principal sen testSpark.jar
> The SparkHbasetoHbase class will update the value of table2 with sum of 
> values of table1 & table2.
> table2 = table1+table2
>  
> 3.Verify the result in hbase shell
> Expected Result: The value of table2 should be 50.
> get 'table1','row1','cf:cid'  will diplay value as 50
> Actual Result : Not updating the value as an error will be thrown when spark 
> tries to connect with hbase service.
> Attached the snapshot of error logs below for more details



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281
 ] 

Sujith edited comment on SPARK-26432 at 1/2/19 6:27 PM:


sorry for the late response due to holidays :), raised a PR please let me know 
for any suggestions. thanks. PR is in WIP as i need to attach test report which 
i will attach tomorrow


was (Author: s71955):
sorry for the late response due to holidays :), raised a PR please let me know 
for any suggestions. thanks

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281
 ] 

Sujith commented on SPARK-26432:


sorry for the late response due to holidays :), raised a PR please let me know 
for any suggestions. thanks

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception

2019-01-02 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732002#comment-16732002
 ] 

Sujith commented on SPARK-26454:


I think [~hyukjin.kwon]  idea is better and simple, we can reduce level to 
warn, because when you say error which means user wont expect the particular 
operation to be successful sometimes . so to avoid confusions better to lower 
the error level.

> While creating new UDF with JAR though UDF is created successfully, it throws 
> IllegegalArgument Exception
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Trivial
> Attachments: create_exception.txt
>
>
> 【Test step】：
>  1.launch spark-shell
>  2. set role admin;
>  3. create new function
>    CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
>  4. Do select on the function
>  sql("select Func('2018-03-09')").show()
>  5.Create new UDF with same JAR
>     sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2018-12-24 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728463#comment-16728463
 ] 

Sujith commented on SPARK-26432:


Thanks for the suggestions.

I will update the description, this issue has been reported by our customer who 
were trying to connect spark with Hbase 2.1 version.
 - HBase 2.1 is the only one broken among HBase versions? Could you link the 
Apache HBase issue which removes that API here? _From Hbase 2.0 this particular 
 deprecated API obtainToken(conf) API is been removed. 
https://issues.apache.org/jira/browse/HBASE-14713_ 
 - Is it enough to make `HBaseDelegationTokenProvider` support HBase 2.1 - .  
_we already had a consistent API obtainToken(Connection con) available from  
older versions of hbase for obtaining the token.  if we use this consistent API 
 we can avoid break while using  hbase upgraded versions ._

I will raise a PR for handling this issue soon where i can include more details 
for this issue. 

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2018-12-23 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Description: 
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.

 

Attached the snapshot of error logs

  was:
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.


> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2018-12-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728014#comment-16728014
 ] 

Sujith edited comment on SPARK-26432 at 12/23/18 5:49 PM:
--

This is mainly happening because spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the kerberos security token and the API has been removed from 
hbase 2.1 version , as i analyzed there is one more stable API in 

public static Token obtainToken(Connection conn) 
in TokenUtil class , i think spark shall use this stable api for getting the 
delegation token.

To invoke this api first connection object has to be retrieved from 
ConnectionFactory and the same connection can be passed to 
obtainToken(Connection conn) for getting token.

 

I can raise a PR soon for handling this issue, please let me know for any 
clarifications or suggestions.


was (Author: s71955):
This is mainly happening because spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the kerberos security token and the API has been removed from 
hbase 2.1 version , as i analyzed there is one more stable API in 

public static Token obtainToken(Connection conn) 
in TokenUtil class , i think spark shall use this stable api for getting the 
delegation token.

To invoke this api first connection object has to be retrieved from 
ConnectionFactory and the same connection can be passed to 
obtainToken(Connection conn) for getting token.

 

I can rase a PR soon for handling this issue, please let me know for any 
clarifications or suggestions.

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2018-12-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728014#comment-16728014
 ] 

Sujith edited comment on SPARK-26432 at 12/23/18 5:48 PM:
--

This is mainly happening because spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the kerberos security token and the API has been removed from 
hbase 2.1 version , as i analyzed there is one more stable API in 

public static Token obtainToken(Connection conn) 
in TokenUtil class , i think spark shall use this stable api for getting the 
delegation token.

To invoke this api first connection object has to be retrieved from 
ConnectionFactory and the same connection can be passed to 
obtainToken(Connection conn) for getting token.

 

I can rase a PR soon for handling this issue, please let me know for any 
clarifications or suggestions.


was (Author: s71955):
This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the kerberos security token and the API has been removed from 
hbase 2.1 version , as i analyzed there is one more stable API in 


 public static Token obtainToken(Connection 
conn) in TokenUtil class , i think spark shall use this stable api for getting 
the delegation token.

To invoke this api first connection object has to be retrieved from 
ConnectionFactory and the same connection can be passed to 
obtainToken(Connection conn) for getting token.

 

I can rase a PR soon for handling this issue, please let me know for any 
clarifications or suggestions.

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.
>  
> Attached the snapshot of error logs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.

2018-12-23 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Summary: Not able to connect Hbase 2.1 service Getting 
NoSuchMethodException while trying to obtain token from Hbase 2.1 service.  
(was: Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 
service)

> Not able to connect Hbase 2.1 service Getting NoSuchMethodException while 
> trying to obtain token from Hbase 2.1 service.
> 
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service

2018-12-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728015#comment-16728015
 ] 

Sujith commented on SPARK-26432:


cc [~cloud_fan]  [~vanzin]

> Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 
> service
> -
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service

2018-12-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728014#comment-16728014
 ] 

Sujith commented on SPARK-26432:


This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the kerberos security token and the API has been removed from 
hbase 2.1 version , as i analyzed there is one more stable API in 


 public static Token obtainToken(Connection 
conn) in TokenUtil class , i think spark shall use this stable api for getting 
the delegation token.

To invoke this api first connection object has to be retrieved from 
ConnectionFactory and the same connection can be passed to 
obtainToken(Connection conn) for getting token.

 

I can rase a PR soon for handling this issue, please let me know for any 
clarifications or suggestions.

> Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 
> service
> -
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service

2018-12-23 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Description: 
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api 

public static Token obtainToken(Configuration 
conf)  

for obtaining the token and the same has been removed from hbase 2.1 version.

  was:
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api  for 
obtaining the token

and the same has been removed from hbase 2.1 version


> Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 
> service
> -
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api 
> public static Token obtainToken(Configuration 
> conf)  
> for obtaining the token and the same has been removed from hbase 2.1 version.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service

2018-12-23 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Attachment: hbase-dep-obtaintok.png

> Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 
> service
> -
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark we were using  a deprecated hbase 
> api  for obtaining the token and the same has been removed from hbase 2.1 
> version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service

2018-12-23 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26432:
---
Description: 
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark uses  a deprecated hbase api  for 
obtaining the token

and the same has been removed from hbase 2.1 version

  was:
Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark we were using  a deprecated hbase api 
 for obtaining the token and the same has been removed from hbase 2.1 version


> Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 
> service
> -
>
> Key: SPARK-26432
> URL: https://issues.apache.org/jira/browse/SPARK-26432
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: hbase-dep-obtaintok.png
>
>
> Getting NoSuchMethodException :
> org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)
> while trying  connect hbase 2.1 service from spark.
> This is mainly happening because in spark uses  a deprecated hbase api  for 
> obtaining the token
> and the same has been removed from hbase 2.1 version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service

2018-12-23 Thread Sujith (JIRA)

Sujith created SPARK-26432:
--

 Summary: Getting NoSuchMethodException while trying to obtain 
token from Hbase 2.1 service
 Key: SPARK-26432
 URL: https://issues.apache.org/jira/browse/SPARK-26432
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0, 2.3.2
Reporter: Sujith


Getting NoSuchMethodException :

org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration)

while trying  connect hbase 2.1 service from spark.

This is mainly happening because in spark we were using  a deprecated hbase api 
 for obtaining the token and the same has been removed from hbase 2.1 version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t

2018-12-01 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705935#comment-16705935
 ] 

Sujith commented on SPARK-26165:


CC [~marmbrus]  [~yhuai] - PLEASE LET ME KNOW YOUR SUGGESTIONS

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: image-2018-11-26-13-00-36-896.png, 
> image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t

2018-11-27 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700459#comment-16700459
 ] 

Sujith commented on SPARK-26165:


[~srowen] i can also raise a PR for this issue so that even reviewers can get a 
complete insight about the problem and solution, or even i can wait for Yins 
and Micheal confirmation.

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: image-2018-11-26-13-00-36-896.png, 
> image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698469#comment-16698469
 ] 

Sujith edited comment on SPARK-26165 at 11/26/18 7:31 AM:
--

This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] ; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp. 

There are some testcases also present which similar usage and we are casting 
the type to string

!image-2018-11-26-13-01-28-299.png!

 i thought to just improvise the logic as per my above description.

this issue we met in our customer environment where they reported filter query 
is slow, after doing an initial analysis i came to know we were casting the 
TimeStamp column expression to string. 

 

 


was (Author: s71955):
This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] ; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp. 

There are some testcases also present which similar usage and we are casting 
the type to string

!image-2018-11-26-13-01-28-299.png!

 i thought to just improvise the logic as per my above description.

if not a valid use-case then we can close. this issue i met in our customer 
environment where they reported filter query is slow, after doing an initial 
analysis i came to know we were casting the TimeStamp column expression to 
string. 

 

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: image-2018-11-26-13-00-36-896.png, 
> image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698469#comment-16698469
 ] 

Sujith edited comment on SPARK-26165 at 11/26/18 7:30 AM:
--

This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] ; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp. 

There are some testcases also present which similar usage and we are casting 
the type to string

!image-2018-11-26-13-01-28-299.png!

 i thought to just improvise the logic as per my above description.

if not a valid use-case then we can close. this issue i met in our customer 
environment where they reported filter query is slow, after doing an initial 
analysis i came to know we were casting the TimeStamp column expression to 
string. 

 

 


was (Author: s71955):
This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] ; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp.   

i thought to just improvise the logic as per my above description.

if not a valid use-case then we can close. this issue i met in our customer 
environment where they reported filter query is slow, after doing an initial 
analysis i came to know we were casting the TimeStamp column expression to 
string. 

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: image-2018-11-26-13-00-36-896.png, 
> image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698469#comment-16698469
 ] 

Sujith commented on SPARK-26165:


This change is been done as part of PR  
"[https://github.com/apache/spark/pull/6888] ; where we introduced string 
casting for if left/right expression type is TimeStamp, For equality cases we 
were implicitly casting the right/left side string type expressions to 
TimeStamp.   

i thought to just improvise the logic as per my above description.

if not a valid use-case then we can close. this issue i met in our customer 
environment where they reported filter query is slow, after doing an initial 
analysis i came to know we were casting the TimeStamp column expression to 
string. 

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698259#comment-16698259
 ] 

Sujith commented on SPARK-26165:


Sure Sean, so you mean user shall explicitly cast and no need to handle this 
implicitly by the system.

Actually i thought to take advantage of  stringToTimestamp() method in 
org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class for handling this 
scenario.

Fine, so you want me to close this Jira?

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698259#comment-16698259
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:32 PM:
--

Sure Sean, so you mean user shall explicitly cast and no need to handle this 
implicitly by the system.

Actually i thought it will be pretty easy and risk free to handle these 
scenario as we already have a method stringToTimestamp() method in 
org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class from long time.

Fine, so you want me to close this Jira?

 


was (Author: s71955):
Sure Sean, so you mean user shall explicitly cast and no need to handle this 
implicitly by the system.

Actually i thought it will be pretty easy to handle these scenario as we 
already have a method stringToTimestamp() method in 
org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class.

Fine, so you want me to close this Jira?

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698259#comment-16698259
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:32 PM:
--

Sure Sean, so you mean user shall explicitly cast and no need to handle this 
implicitly by the system.

Actually i thought it will be pretty easy to handle these scenario as we 
already have a method stringToTimestamp() method in 
org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class.

Fine, so you want me to close this Jira?

 


was (Author: s71955):
Sure Sean, so you mean user shall explicitly cast and no need to handle this 
implicitly by the system.

Actually i thought to take advantage of  stringToTimestamp() method in 
org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class for handling this 
scenario.

Fine, so you want me to close this Jira?

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:17 PM:
--

I think we shall avoid casting to string in the cases where filter condition 
literals of string type value  can generate a valid  date/timestamp,  like the 
filter  condition mentioned in jira ,otherwise we can fallback to the current 
logic of casting to string type.

This approach can avoid the unnecessary overhead of casting the left filter 
column expression timestamp/date type values to string type as mentioned in 
JIRA.

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 


was (Author: s71955):
I think we shall avoid casting to string in the cases where filter condition 
literals of string type value  can generate a valid  date/timestamp,  like the 
filter  condition mentioned in jira ,otherwise we can fallback to the current 
logic of casting to string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string type as mentioned 
in JIRA.

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:16 PM:
--

I think we shall avoid casting to string in the cases where filter condition 
literals of string type value  can generate a valid  date/timestamp,  like the 
filter  condition mentioned in jira ,otherwise we can fallback to the current 
logic of casting to string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string type as mentioned 
in JIRA.

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 


was (Author: s71955):
I think we shall avoid casting to string in the cases where filter condition 
literals string type value  can generate a valid  date or timestamp,  like the 
filter  condition mentioned in jira ,otherwise we can fallback to the current 
logic of cast to string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:15 PM:
--

I think we shall avoid casting to string in the cases where filter condition 
literals string type value  can generate a valid  date or timestamp,  like the 
filter  condition mentioned in jira ,otherwise we can fallback to the current 
logic of cast to string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 


was (Author: s71955):
I think we shall avoid casting to string in the cases like if Date/timestamp 
string can be converted to a valid date or timestamp like the condition 
mentioned in jira ,otherwise we can fallback to the current logic of cast to 
string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:11 PM:
--

I think we shall avoid casting to string in the cases like if Date/timestamp 
string can be converted to a valid date or timestamp like the condition 
mentioned in jira ,otherwise we can fallback to the current logic of cast to 
string type.

This approach can also avoid the unnecessary overhead of casting the left 
filter column expression timestamp/date type values to string

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 cc [~srowen]  [~cloud_fan] [~vinodkc] 


was (Author: s71955):
I think we shall avoid casting to string in the cases like if Date/timestamp 
string can be converted to a valid date or timestamp like the condition 
mentioned in jira ,otherwise we can cast to string type as per current logic.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:59 PM:
--

I think we shall avoid casting to string in the cases like above,if 
Date/timestamp string can be converted to a valid date or timestamp ,otherwise 
we can cast to string type as per current logic.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can cast to string type, only if filter 
expression with string literal of expression cannot be converted to 
data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 7:00 PM:
--

I think we shall avoid casting to string in the cases like if Date/timestamp 
string can be converted to a valid date or timestamp like the condition 
mentioned in jira ,otherwise we can cast to string type as per current logic.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 


was (Author: s71955):
I think we shall avoid casting to string in the cases like above,if 
Date/timestamp string can be converted to a valid date or timestamp ,otherwise 
we can cast to string type as per current logic.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:57 PM:
--

I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can cast to string type, only if filter 
expression with string literal of expression cannot be converted to 
data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal of right expression 
cannot be converted to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:35 PM:
--

I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal of right expression 
cannot be converted to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue..

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal of left expression 
cannot be converted to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue..

 

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:30 PM:
--

I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal cannot be converted 
to data/timestamp.

I wll raise a PR for handle this issue..

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp , if the filter expression is string literal value 
is not a valid type then we can convert the filter right expression column to 
sting type as per current logic.

I wll raise a PR for handle this issue..

 

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Summary: Date and Timestamp column expression is getting converted to 
string in less than/greater than filter query even though valid date/timestamp 
string literal is used in the right side filter expression  (was: Date and 
Timestamp column is getting converted to string in less than/greater than 
filter query even though valid date/timestamp string literal is used in the 
right side filter expression)

> Date and Timestamp column expression is getting converted to string in less 
> than/greater than filter query even though valid date/timestamp string 
> literal is used in the right side filter expression
> --
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:35 PM:
--

I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal of right expression 
cannot be converted to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue.. please let me know for any suggestions.

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal of right expression 
cannot be converted to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue..

 

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:34 PM:
--

I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal of left expression 
cannot be converted to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue..

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal cannot be converted 
to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue..

 

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith edited comment on SPARK-26165 at 11/25/18 6:33 PM:
--

I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal cannot be converted 
to data/timestamp.

cc [~srowen]  [~cloud_fan] [~vinodkc] 

I wll raise a PR for handle this issue..

 


was (Author: s71955):
I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp ,we can convert the filter right expression column 
to sting type only if filter expression with string literal cannot be converted 
to data/timestamp.

I wll raise a PR for handle this issue..

 

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right si

2018-11-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227
 ] 

Sujith commented on SPARK-26165:


I think we shall avoid casting if Date/timestamp string which can be converted 
to a valid date or timestamp , if the filter expression is string literal value 
is not a valid type then we can convert the filter right expression column to 
sting type as per current logic.

I wll raise a PR for handle this issue..

 

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Description: 
Date and Timestamp column is getting converted to string in less than/greater 
than filter query even though date strings that contains a time, like 
'2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like 
'2018-03-18 12:39:40' to a timestamp.

 


scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
order_creation_date > '2017-02-26 13:45:12'""").show(false);
+---

|== Parsed Logical Plan ==
'Project ['username]
+- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
 +- 'UnresolvedRelation `orders`

== Analyzed Logical Plan ==
username: string
Project [username#59]
+- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
 +- SubqueryAlias orders
 +- HiveTableRelation `default`.`orders`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
order_creation_date#60, amount#61]

== Optimized Logical Plan ==
Project [username#59]
+- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 as 
string) > 2017-02-26 13:45:12))
 +- HiveTableRelation `default`.`orders`, 
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
order_creation_date#60, amount#61]

== Physical Plan ==
*(1) Project [username#59]
+- *(1) Filter (isnotnull(order_creation_date#60) && 
(cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
 +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
`default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
[username#59, order_creation
+
-

  was:
Date and Timestamp column is getting converted to string in less than/greater 
than filter query even though date strings that contains a time, like 
'2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like 
'2018-03-18 12:39:40' to a timestamp.

 

 


> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
>

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Attachment: timestamp_filter_perf.PNG

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Attachment: (was: timestamp_filter_perf.PNG)

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Attachment: timestamp_filter_perf.PNG

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: timestamp_filter_perf.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
> scala> spark.sql("""explain extended SELECT username FROM orders WHERE 
> order_creation_date > '2017-02-26 13:45:12'""").show(false);
> +---
> |== Parsed Logical Plan ==
> 'Project ['username]
> +- 'Filter ('order_creation_date > 2017-02-26 13:45:12)
>  +- 'UnresolvedRelation `orders`
> == Analyzed Logical Plan ==
> username: string
> Project [username#59]
> +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)
>  +- SubqueryAlias orders
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Optimized Logical Plan ==
> Project [username#59]
> +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 
> as string) > 2017-02-26 13:45:12))
>  +- HiveTableRelation `default`.`orders`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, 
> order_creation_date#60, amount#61]
> == Physical Plan ==
> *(1) Project [username#59]
> +- *(1) Filter (isnotnull(order_creation_date#60) && 
> (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12))
>  +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation 
> `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> [username#59, order_creation
> +
> -



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Attachment: (was: testreport.PNG)

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Description: 
Date and Timestamp column is getting converted to string in less than/greater 
than filter query even though date strings that contains a time, like 
'2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like 
'2018-03-18 12:39:40' to a timestamp.

 

 

  was:Date and Timestamp column is getting converted to string in less 
than/greater than filter query even though date strings that contains a time, 
like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
like '2018-03-18 12:39:40' to a timestamp.


> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-26165:
---
Attachment: testreport.PNG

> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though valid date/timestamp string literal is used in 
> the right side filter expression
> ---
>
> Key: SPARK-26165
> URL: https://issues.apache.org/jira/browse/SPARK-26165
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Sujith
>Priority: Major
> Attachments: testreport.PNG
>
>
> Date and Timestamp column is getting converted to string in less than/greater 
> than filter query even though date strings that contains a time, like 
> '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string 
> like '2018-03-18 12:39:40' to a timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side

2018-11-25 Thread Sujith (JIRA)

Sujith created SPARK-26165:
--

 Summary: Date and Timestamp column is getting converted to string 
in less than/greater than filter query even though valid date/timestamp string 
literal is used in the right side filter expression
 Key: SPARK-26165
 URL: https://issues.apache.org/jira/browse/SPARK-26165
 Project: Spark
  Issue Type: Improvement
  Components: Optimizer
Affects Versions: 2.4.0, 2.3.2
Reporter: Sujith


Date and Timestamp column is getting converted to string in less than/greater 
than filter query even though date strings that contains a time, like 
'2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like 
'2018-03-18 12:39:40' to a timestamp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25332) Instead of broadcast hash join ,Sort merge join has selected when restart spark-shell/spark-JDBC for hive provider

2018-10-16 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652914#comment-16652914
 ] 

Sujith commented on SPARK-25332:


[~Bjangir] i think you are right, there is a bug while inserting data into 
table when we use stored by clause in create command, I am working on it. soon 
i will be raising a PR .

[~maropu]  *[srowen|https://github.com/srowen] 
[cloud-fan|https://github.com/cloud-fan]* 

 i will raise a PR to handle this and keep you guys in loop. thanks

> Instead of broadcast hash join  ,Sort merge join has selected when restart 
> spark-shell/spark-JDBC for hive provider
> ---
>
> Key: SPARK-25332
> URL: https://issues.apache.org/jira/browse/SPARK-25332
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Babulal
>Priority: Major
>
> spark.sql("create table x1(name string,age int) stored as parquet ")
>  spark.sql("insert into x1 select 'a',29")
>  spark.sql("create table x2 (name string,age int) stored as parquet '")
>  spark.sql("insert into x2_ex select 'a',29")
>  scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
> == Physical Plan ==
> *{color:#14892c}(2) BroadcastHashJoin{color} [name#101], [name#103], Inner, 
> BuildRight
> :- *(2) Project [name#101, age#102]
> : +- *(2) Filter isnotnull(name#101)
> : +- *(2) FileScan parquet default.x1_ex[name#101,age#102] Batched: true, 
> Format: Parquet, Location: 
> InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1, 
> PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: 
> struct
> +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
>  +- *(1) Project [name#103, age#104]
>  +- *(1) Filter isnotnull(name#103)
>  +- *(1) FileScan parquet default.x2_ex[name#103,age#104] Batched: true, 
> Format: Parquet, Location: 
> InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2, 
> PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: 
> struct
>  
>  
> Now Restart Spark-Shell or do spark-submit orrestart JDBCServer  again and 
> run same select query again
>  
> scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
> scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain
> == Physical Plan ==
> *{color:#FF}(5) SortMergeJoin [{color}name#43], [name#45], Inner
> :- *(2) Sort [name#43 ASC NULLS FIRST], false, 0
> : +- Exchange hashpartitioning(name#43, 200)
> : +- *(1) Project [name#43, age#44]
> : +- *(1) Filter isnotnull(name#43)
> : +- *(1) FileScan parquet default.x1[name#43,age#44] Batched: true, Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1], 
> PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: 
> struct
> +- *(4) Sort [name#45 ASC NULLS FIRST], false, 0
>  +- Exchange hashpartitioning(name#45, 200)
>  +- *(3) Project [name#45, age#46]
>  +- *(3) Filter isnotnull(name#45)
>  +- *(3) FileScan parquet default.x2[name#45,age#46] Batched: true, Format: 
> Parquet, Location: 
> InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2], 
> PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: 
> struct
>  
>  
> scala> spark.sql("desc formatted x1").show(200,false)
> ++--+---+
> |col_name |data_type |comment|
> ++--+---+
> |name |string |null |
> |age |int |null |
> | | | |
> |# Detailed Table Information| | |
> |Database |default | |
> |Table |x1 | |
> |Owner |Administrator | |
> |Created Time |Sun Aug 19 12:36:58 IST 2018 | |
> |Last Access |Thu Jan 01 05:30:00 IST 1970 | |
> |Created By |Spark 2.3.0 | |
> |Type |MANAGED | |
> |Provider |hive | |
> |Table Properties |[transient_lastDdlTime=1534662418] | |
> |Location |file:/D:/spark_release/spark/bin/spark-warehouse/x1 | |
> |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | 
> |
> |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | 
> |
> |OutputFormat 
> |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| |
> |Storage Properties |[serialization.format=1] | |
> |Partition Provider |Catalog | |
> ++--+---+
>  
> With datasource table ,working fine ( create table using parquet instead of 
> stored by )



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail:

[jira] [Commented] (SPARK-25071) BuildSide is coming not as expected with join queries

2018-10-15 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651113#comment-16651113
 ] 

Sujith commented on SPARK-25071:


cc [~ZenWzh]  Please suggest as i want to take up this JIRA

> BuildSide is coming not as expected with join queries
> -
>
> Key: SPARK-25071
> URL: https://issues.apache.org/jira/browse/SPARK-25071
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: Spark 2.3.1 
> Hadoop 2.7.3
>Reporter: Ayush Anubhava
>Priority: Major
>
> *BuildSide is not coming as expected.*
> Pre-requisites:
> *CBO is set as true &  spark.sql.cbo.joinReorder.enabled= true.*
> *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec*
> *Steps:*
> *Scenario 1:*
> spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='800')")
>  spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
>  
> *Result 1:*
> scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#0L)
>  : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#1L)
>  +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L]
> scala> val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight
> scala> println(buildSide)
>  *BuildRight*
>  
> *Scenario 2:*
> spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='80')")
>  spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
> *Result 2:*
> scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#4L)
>  : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#5L)
>  +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L]
> scala> val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight*
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2018-10-15 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651097#comment-16651097
 ] 

Sujith commented on SPARK-22601:


cc *[gatorsmile|https://github.com/gatorsmile]  please assign this JIRA to me 
as already this PR  is been merged. Thanks*

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
> Fix For: 2.2.1
>
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2018-10-15 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith resolved SPARK-22601.

   Resolution: Fixed
Fix Version/s: 2.2.1

PR already merged.

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
> Fix For: 2.2.1
>
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25521) Job id showing null when Job is finished.

2018-09-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626849#comment-16626849
 ] 

Sujith edited comment on SPARK-25521 at 9/25/18 6:39 AM:
-

[~Bjangir] I could see the jobcontext doesn't have jobID when the flow hits 
FileFormatWriter.scala in the insert flow. Moreover this issue is happening in 
insert flow. I will check into this issue more and raise a PR for handling the 
same. Thanks for reporting.


was (Author: s71955):
[~Bjangir] I could see the jobcontext doesn't have jobID when the flow hits 
FileFormatWriter.scala in the insert flow. I will check into this issue more 
and raise a PR for handling the same. Thanks for reporting.

> Job id showing null when Job is finished.
> -
>
> Key: SPARK-25521
> URL: https://issues.apache.org/jira/browse/SPARK-25521
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.1
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2018-09-25-12-01-31-871.png
>
>
> scala> spark.sql("create table x1(name string,age int) stored as parquet")
>  scala> spark.sql("insert into x1 select 'a',29")
>  check logs
>  2018-08-19 12:45:36 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 
> (TID 0) in 874 ms on localhost (executor
>  driver) (1/1)
>  2018-08-19 12:45:36 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose 
> tasks have all completed, from pool
>  2018-08-19 12:45:36 INFO DAGScheduler:54 - ResultStage 0 (sql at 
> :24) finished in 1.131 s
>  2018-08-19 12:45:36 INFO DAGScheduler:54 - Job 0 finished: sql at 
> :24, took 1.233329 s
>  2018-08-19 12:45:36 INFO FileFormatWriter:54 - Job 
> {color:#d04437}null{color} committed.
>  2018-08-19 12:45:36 INFO FileFormatWriter:54 - Finished processing stats for 
> job null.
>  res4: org.apache.spark.sql.DataFrame = []
>  
>   !image-2018-09-25-12-01-31-871.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25521) Job id showing null when Job is finished.

2018-09-25 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626849#comment-16626849
 ] 

Sujith commented on SPARK-25521:


[~Bjangir] I could see the jobcontext doesn't have jobID when the flow hits 
FileFormatWriter.scala in the insert flow. I will check into this issue more 
and raise a PR for handling the same. Thanks for reporting.

> Job id showing null when Job is finished.
> -
>
> Key: SPARK-25521
> URL: https://issues.apache.org/jira/browse/SPARK-25521
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.1
>Reporter: Babulal
>Priority: Minor
> Attachments: image-2018-09-25-12-01-31-871.png
>
>
> scala> spark.sql("create table x1(name string,age int) stored as parquet")
>  scala> spark.sql("insert into x1 select 'a',29")
>  check logs
>  2018-08-19 12:45:36 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 
> (TID 0) in 874 ms on localhost (executor
>  driver) (1/1)
>  2018-08-19 12:45:36 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose 
> tasks have all completed, from pool
>  2018-08-19 12:45:36 INFO DAGScheduler:54 - ResultStage 0 (sql at 
> :24) finished in 1.131 s
>  2018-08-19 12:45:36 INFO DAGScheduler:54 - Job 0 finished: sql at 
> :24, took 1.233329 s
>  2018-08-19 12:45:36 INFO FileFormatWriter:54 - Job 
> {color:#d04437}null{color} committed.
>  2018-08-19 12:45:36 INFO FileFormatWriter:54 - Finished processing stats for 
> job null.
>  res4: org.apache.spark.sql.DataFrame = []
>  
>   !image-2018-09-25-12-01-31-871.png!
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-09-11 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-23425:
---
Docs Text: 
Release notes:

Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading 
data, e.g.:

LOAD DATA INPATH 'hdfs://hacluster/user/ext*'
LOAD DATA INPATH 'hdfs://hacluster/user/???/data'

Where these characters are used literally in paths, they must be escaped with a 
backslash.

Wildcards can be used in the folder level of a local File system in Load 
command from now.

 e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/

Now onward normal Space convention can be used in folder/file names (e.g file 
Name.csv), Older versions space in folder/file names has been represented using 
'%20'(e.g. myFile%20Name).

#

  was:
Release notes:

Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading 
data, e.g.:

LOAD DATA INPATH 'hdfs://hacluster/user/ext*'
LOAD DATA INPATH 'hdfs://hacluster/user/???/data'

Where these characters are used literally in paths, they must be escaped with a 
backslash.

Wildcards can be used in the folder level of a local File system in Load 
command from now.

 e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/


> load data for hdfs file path with wild card usage is not working properly
> -
>
> Key: SPARK-23425
> URL: https://issues.apache.org/jira/browse/SPARK-23425
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Sujith
>Assignee: Sujith
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.0
>
> Attachments: wildcard_issue.PNG
>
>
> load data command  for loading data from non local  file paths by using wild 
> card strings lke * are not working
> eg:
> "load data inpath 'hdfs://hacluster/user/ext*  into table t1"
> Getting Analysis excepton while executing this query
> !image-2018-02-14-23-41-39-923.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-09-11 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-23425:
---
Docs Text: 
Release notes:

Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading 
data, e.g.:

LOAD DATA INPATH 'hdfs://hacluster/user/ext*'
LOAD DATA INPATH 'hdfs://hacluster/user/???/data'

Where these characters are used literally in paths, they must be escaped with a 
backslash.

Wildcards can be used in the folder level of a local File system in Load 
command from now.

 e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/

  was:
Release notes:

Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading 
data, e.g.:

LOAD DATA PATH 'hdfs://hacluster/user/ext*'
LOAD DATA PATH 'hdfs://hacluster/user/???/data'

Where these characters are used literally in paths, they must be escaped with a 
backslash.


> load data for hdfs file path with wild card usage is not working properly
> -
>
> Key: SPARK-23425
> URL: https://issues.apache.org/jira/browse/SPARK-23425
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Sujith
>Assignee: Sujith
>Priority: Major
>  Labels: release-notes
> Fix For: 2.4.0
>
> Attachments: wildcard_issue.PNG
>
>
> load data command  for loading data from non local  file paths by using wild 
> card strings lke * are not working
> eg:
> "load data inpath 'hdfs://hacluster/user/ext*  into table t1"
> Getting Analysis excepton while executing this query
> !image-2018-02-14-23-41-39-923.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2018-09-11 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610541#comment-16610541
 ] 

Sujith commented on SPARK-25392:


Not sure whether it make sense to show this details in History Server but 
of-course it shall not throw an error.

[~srowen]  [~LI,Xiao] [~hyukjin.kwon]  please let us know for any suggestions. 
Thanks all

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page

2018-09-11 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610535#comment-16610535
 ] 

Sujith commented on SPARK-25392:


cc [~srowen]  [~LI,Xiao]

> [Spark Job History]Inconsistent behaviour for pool details in spark web UI 
> and history server page 
> ---
>
> Key: SPARK-25392
> URL: https://issues.apache.org/jira/browse/SPARK-25392
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: OS: SUSE 11
> Spark Version: 2.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Steps:
> 1.Enable spark.scheduler.mode = FAIR
> 2.Submitted beeline jobs
> create database JH;
> use JH;
> create table one12( id int );
> insert into one12 values(12);
> insert into one12 values(13);
> Select * from one12;
> 3.Click on JDBC Incompleted Application ID in Job History Page
> 4. Go to Job Tab in staged Web UI page
> 5. Click on run at AccessController.java:0 under Desription column
> 6 . Click default under Pool Name column of Completed Stages table
> URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default
> 7. It throws below error
> HTTP ERROR 400
> Problem accessing /history/application_1536399199015_0006/stages/pool/. 
> Reason:
> Unknown pool: default
> Powered by Jetty:// x.y.z
> But under 
> Yarn resource page it display the summary under Fair Scheduler Pool: default 
> URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default
> Summary
> Pool Name Minimum Share   Pool Weight Active Stages   Running Tasks   
> SchedulingMode
> default   0   1   0   0   FIFO



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

2018-09-04 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603126#comment-16603126
 ] 

Sujith commented on SPARK-25271:


[~cloud_fan] [~sowen]  Will this cause a compatibility problem compare to older 
version, If user has  null record ,then he is getting an exception with the 
current version where as the older version of spark(2.2.1)  wont throw any 
exception.

I think the Output writers has been updated in the below PR

[https://github.com/apache/spark/pull/20521]

> Creating parquet table with all the column null throws exception
> 
>
> Key: SPARK-25271
> URL: https://issues.apache.org/jira/browse/SPARK-25271
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: shivusondur
>Priority: Major
>
> {code:java}
>  1)cat /data/parquet.dat
> 1$abc2$pqr:3$xyz
> null{code}
>  
> {code:java}
> 2)spark.sql("create table vp_reader_temp (projects map) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' 
> MAP KEYS TERMINATED BY '$'")
> {code}
> {code:java}
> 3)spark.sql("
> LOAD DATA LOCAL INPATH '/data/parquet.dat' INTO TABLE vp_reader_temp")
> {code}
> {code:java}
> 4)spark.sql("create table vp_reader STORED AS PARQUET as select * from 
> vp_reader_temp")
> {code}
> *Result :* Throwing exception (Working fine with spark 2.2.1)
> {code:java}
> java.lang.RuntimeException: Parquet record is malformed: empty fields are 
> illegal, the field should be ommited completely instead
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:180)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
>   at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:406)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:283)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:281)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1438)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:286)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:211)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:210)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.parquet.io.ParquetEncodingException: empty fields are 
> illegal, the field should be ommited completely instead
>   at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:320)
>   at 
> org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:165)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
>   at 
>

[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception

2018-09-04 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603128#comment-16603128
 ] 

Sujith commented on SPARK-25271:


cc [~hyukjin.kwon]

> Creating parquet table with all the column null throws exception
> 
>
> Key: SPARK-25271
> URL: https://issues.apache.org/jira/browse/SPARK-25271
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: shivusondur
>Priority: Major
>
> {code:java}
>  1)cat /data/parquet.dat
> 1$abc2$pqr:3$xyz
> null{code}
>  
> {code:java}
> 2)spark.sql("create table vp_reader_temp (projects map) ROW 
> FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' 
> MAP KEYS TERMINATED BY '$'")
> {code}
> {code:java}
> 3)spark.sql("
> LOAD DATA LOCAL INPATH '/data/parquet.dat' INTO TABLE vp_reader_temp")
> {code}
> {code:java}
> 4)spark.sql("create table vp_reader STORED AS PARQUET as select * from 
> vp_reader_temp")
> {code}
> *Result :* Throwing exception (Working fine with spark 2.2.1)
> {code:java}
> java.lang.RuntimeException: Parquet record is malformed: empty fields are 
> illegal, the field should be ommited completely instead
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)
>   at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:180)
>   at 
> org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125)
>   at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:406)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:283)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:281)
>   at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1438)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:286)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:211)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:210)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.parquet.io.ParquetEncodingException: empty fields are 
> illegal, the field should be ommited completely instead
>   at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:320)
>   at 
> org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:165)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
>   at 
> org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
>   ... 21 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail:

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396
 ] 

Sujith edited comment on SPARK-25073 at 8/23/18 3:48 PM:
-

 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned

if (executorMem > maxMem)

{ throw new IllegalArgumentException(s"Required executor memory 
($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold 
($maxMem MB) of this cluster! " + "Please check the values of 
'yarn.scheduler.maximum-allocation-mb' and/or " + 
"'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." ) 
}

i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,even though this issue sounds to be more kind of negative scenario

Please correct me if i am missing something.


was (Author: s71955):
 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned

if (executorMem > maxMem)

{

throw new IllegalArgumentException(s"Required executor memory ($executorMemory" 
+ s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of 
this cluster! " + "Please check the values of 
'yarn.scheduler.maximum-allocation-mb' and/or " + 
"'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." )

}

 i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396
 ] 

Sujith edited comment on SPARK-25073 at 8/23/18 3:43 PM:
-

 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned

if (executorMem > maxMem)

{

throw new IllegalArgumentException(s"Required executor memory ($executorMemory" 
+ s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of 
this cluster! " + "Please check the values of 
'yarn.scheduler.maximum-allocation-mb' and/or " + 
"'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." )

}

 i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,


was (Author: s71955):
 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned

if (executorMem > maxMem)

{ throw new IllegalArgumentException(s"Required executor memory 
($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold 
($maxMem MB) of this cluster! " + "Please check the values of 
'yarn.scheduler.maximum-allocation-mb' and/or " + 
"'yarn.nodemanager.resource.memory-mb and increase the memory appropriately."

) }


  i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error request for scenario2 should be more around 
> yarn.scheduler.maximum-allocation-mb' and/or 
>

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396
 ] 

Sujith edited comment on SPARK-25073 at 8/23/18 3:42 PM:
-

 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned

if (executorMem > maxMem)

{ throw new IllegalArgumentException(s"Required executor memory 
($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold 
($maxMem MB) of this cluster! " + "Please check the values of 
'yarn.scheduler.maximum-allocation-mb' and/or " + 
"'yarn.nodemanager.resource.memory-mb and increase the memory appropriately."

) }


  i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,


was (Author: s71955):
 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned
```

if (executorMem > maxMem) {

throw new IllegalArgumentException(s"Required executor memory ($executorMemory" 
+
 s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this 
cluster! " +
 "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " +
 "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately.")
 }
```
 i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error request for scenario2 should be more around 
> yarn.scheduler.maximum-allocation-mb' and/or 
>

[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req

2018-08-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396
 ] 

Sujith commented on SPARK-25073:


 Yes, in the executor memory validation check we are displaying the proper 
message considering both yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client 
class as below, where as for AM container memory allocation validation only 
yarn.scheduler.maximum-allocation-mb is mentioned
```

if (executorMem > maxMem) {

throw new IllegalArgumentException(s"Required executor memory ($executorMemory" 
+
 s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this 
cluster! " +
 "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " +
 "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately.")
 }
```
 i think is we can mention about both  yarn.nodemanager.resource.memory-mb and 
yarn.scheduler.maximum-allocation-mb parameters for am memory validation as 
well,

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error request for scenario2 should be more around 
> yarn.scheduler.maximum-allocation-mb' and/or 
> 'yarn.nodemanager.resource.memory-mb'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro

2018-08-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590089#comment-16590089
 ] 

Sujith edited comment on SPARK-25073 at 8/23/18 11:28 AM:
--

cc [~hyukjin.kwon] [gatorsmile|https://github.com/gatorsmile] [~srowen] 


was (Author: s71955):
@ [~hyukjin.kwon] [@gatorsmile|https://github.com/gatorsmile]

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error request for scenario2 should be more around 
> yarn.scheduler.maximum-allocation-mb' and/or 
> 'yarn.nodemanager.resource.memory-mb'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req

2018-08-23 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590089#comment-16590089
 ] 

Sujith commented on SPARK-25073:


@ [~hyukjin.kwon] [@gatorsmile|https://github.com/gatorsmile]

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error request for scenario2 should be more around 
> yarn.scheduler.maximum-allocation-mb' and/or 
> 'yarn.nodemanager.resource.memory-mb'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-25071) BuildSide is coming not as expected with join queries

2018-08-16 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582419#comment-16582419
 ] 

Sujith edited comment on SPARK-25071 at 8/16/18 12:11 PM:
--

[~hyukjin.kwon]  [~ZenWzh] [~srowen] I have a doubt based on above 
conversation, if we consider purely row count not rawDatasize/totalsize then 
there is a possibility where row size can be very large for a particular table 
even though the number of rows will be same or slightly high, and we are ending 
up broadcasting table with larger data size?


was (Author: s71955):
[~hyukjin.kwon]  [~ZenWzh]a [~srowen] I have a doubt based on above 
conversation, if we consider purely row count not rawDatasize/totalsize then 
there is a possibility where row size can be very large for a particular table 
even though the number of rows will be same or slightly high, and we are ending 
up broadcasting table with larger data size?

> BuildSide is coming not as expected with join queries
> -
>
> Key: SPARK-25071
> URL: https://issues.apache.org/jira/browse/SPARK-25071
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: Spark 2.3.1 
> Hadoop 2.7.3
>Reporter: Ayush Anubhava
>Priority: Major
>
> *BuildSide is not coming as expected.*
> Pre-requisites:
> *CBO is set as true &  spark.sql.cbo.joinReorder.enabled= true.*
> *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec*
> *Steps:*
> *Scenario 1:*
> spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='800')")
>  spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
>  
> *Result 1:*
> scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#0L)
>  : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#1L)
>  +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L]
> scala> val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight
> scala> println(buildSide)
>  *BuildRight*
>  
> *Scenario 2:*
> spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='80')")
>  spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
> *Result 2:*
> scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#4L)
>  : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#5L)
>  +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L]
> scala> val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight*
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25071) BuildSide is coming not as expected with join queries

2018-08-16 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582419#comment-16582419
 ] 

Sujith commented on SPARK-25071:


[~hyukjin.kwon]  [~ZenWzh]a [~srowen] I have a doubt based on above 
conversation, if we consider purely row count not rawDatasize/totalsize then 
there is a possibility where row size can be very large for a particular table 
even though the number of rows will be same or slightly high, and we are ending 
up broadcasting table with larger data size?

> BuildSide is coming not as expected with join queries
> -
>
> Key: SPARK-25071
> URL: https://issues.apache.org/jira/browse/SPARK-25071
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: Spark 2.3.1 
> Hadoop 2.7.3
>Reporter: Ayush Anubhava
>Priority: Major
>
> *BuildSide is not coming as expected.*
> Pre-requisites:
> *CBO is set as true &  spark.sql.cbo.joinReorder.enabled= true.*
> *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec*
> *Steps:*
> *Scenario 1:*
> spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='800')")
>  spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
>  
> *Result 1:*
> scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#0L)
>  : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#1L)
>  +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L]
> scala> val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight
> scala> println(buildSide)
>  *BuildRight*
>  
> *Scenario 2:*
> spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='600','totalSize'='80')")
>  spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', 
> 'rawDataSize'='6000', 'totalSize'='800')")
>  val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  println(buildSide)
> *Result 2:*
> scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = 
> t2.c1)").queryExecution.executedPlan
>  plan: org.apache.spark.sql.execution.SparkPlan =
>  *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight
>  :- *(2) Filter isnotnull(c1#4L)
>  : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L]
>  +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, 
> false]))
>  +- *(1) Filter isnotnull(c1#5L)
>  +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L]
> scala> val buildSide = 
> plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide
>  buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight*
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req

2018-08-09 Thread Sujith (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574864#comment-16574864
 ] 

Sujith commented on SPARK-25073:


Seems to be you are right, Message is bit misleading to the user. As per my 
understanding there is also dependency in yarn.nodemanager.resource.memory-mb 
parameter.

*_yarn.nodemanager.resource.memory-mb:_*

Amount of physical memory, in MB, that can be allocated for containers. It 
means the amount of memory YARN can utilize on this node and therefore this 
property should be lower then the total memory of that machine.

*_yarn.scheduler.maximum-allocation-mb_*

It defines the maximum memory allocation available for a container in MB, it 
means RM can only allocate memory to containers in increments of 
{{"yarn.scheduler.minimum-allocation-mb"}} and not exceed 
{{"yarn.scheduler.maximum-allocation-mb"}} and It should not be more then total 
allocated memory of the Node.

 

I will try to analyze more on this and i will raise PR if it requires a fix. 
Thanks.

 

 

> Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb 
> and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always 
> reports an error request to adjust yarn.scheduler.maximum-allocation-mb
> --
>
> Key: SPARK-25073
> URL: https://issues.apache.org/jira/browse/SPARK-25073
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0, 2.3.1
>Reporter: vivek kumar
>Priority: Minor
>
> When the yarn.nodemanager.resource.memory-mb and/or 
> yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports 
> an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting 
> the error request to be  more around yarn.scheduler.maximum-allocation-mb' 
> and/or 'yarn.nodemanager.resource.memory-mb'.
>  
> Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and 
> yarn.nodemanager.resource.memory-mb =8G
>  # Launch shell on Yarn with am.memory less than nodemanager.resource memory 
> but greater than yarn.scheduler.maximum-allocation-mb
> eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g
>  Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) 
> is above the max threshold (4096 MB) of this cluster! Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and 
> yarn.nodemanager.resource.memory-mb =8g
> a. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory but less than yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g*
>  Error :
> java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> b. Launch shell on Yarn with am.memory greater than nodemanager.resource 
> memory and yarn.scheduler.maximum-allocation-mb
> eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g*
>  Error:
> java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is 
> above the max threshold (*8096 MB*) of this cluster! *Please increase the 
> value of 'yarn.scheduler.maximum-allocation-mb'.*
> at 
> org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325)
>  
> *Expected* : Error request for scenario2 should be more around 
> yarn.scheduler.maximum-allocation-mb' and/or 
> 'yarn.nodemanager.resource.memory-mb'.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24812) Last Access Time in the table description is not valid

2018-07-16 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-24812:
---
Description: 
Last Access Time in the table description is not valid, 

Test steps:

Step 1 -  create a table

Step 2 - Run  command "DESC FORMATTED table"

 Last Access Time will always displayed wrong date

Wed Dec 31 15:59:59 PST 1969 - which is wrong.

!image-2018-07-16-15-37-28-896.png!

In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong 
date.

Please find the snapshot tested in hive for the same com 
!image-2018-07-16-15-38-26-717.png! mand

 

Seems to be a limitation as of now, better we can follow the hive behavior in 
this scenario.

 

 

 

 

  was:
Last Access Time in the table description is not valid, 

Test steps:

Step 1 -  create a table

Step 2 - Run  command "DESC FORMATTED table"

 Last Access Time will always displayed wrong date

Wed Dec 31 15:59:59 PST 1969 - which is wrong.

In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong 
date.

Seems to be a limitation as of now, better we can follow the hive behavior in 
this scenario.

 

 

 

 


> Last Access Time in the table description is not valid
> --
>
> Key: SPARK-24812
> URL: https://issues.apache.org/jira/browse/SPARK-24812
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1
>Reporter: Sujith
>Priority: Minor
> Attachments: image-2018-07-16-15-37-28-896.png, 
> image-2018-07-16-15-38-26-717.png
>
>
> Last Access Time in the table description is not valid, 
> Test steps:
> Step 1 -  create a table
> Step 2 - Run  command "DESC FORMATTED table"
>  Last Access Time will always displayed wrong date
> Wed Dec 31 15:59:59 PST 1969 - which is wrong.
> !image-2018-07-16-15-37-28-896.png!
> In hive its displayed as "UNKNOWN" which makes more sense than displaying 
> wrong date.
> Please find the snapshot tested in hive for the same com 
> !image-2018-07-16-15-38-26-717.png! mand
>  
> Seems to be a limitation as of now, better we can follow the hive behavior in 
> this scenario.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24812) Last Access Time in the table description is not valid

2018-07-16 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-24812:
---
Attachment: image-2018-07-16-15-38-26-717.png

> Last Access Time in the table description is not valid
> --
>
> Key: SPARK-24812
> URL: https://issues.apache.org/jira/browse/SPARK-24812
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1
>Reporter: Sujith
>Priority: Minor
> Attachments: image-2018-07-16-15-37-28-896.png, 
> image-2018-07-16-15-38-26-717.png
>
>
> Last Access Time in the table description is not valid, 
> Test steps:
> Step 1 -  create a table
> Step 2 - Run  command "DESC FORMATTED table"
>  Last Access Time will always displayed wrong date
> Wed Dec 31 15:59:59 PST 1969 - which is wrong.
> In hive its displayed as "UNKNOWN" which makes more sense than displaying 
> wrong date.
> Seems to be a limitation as of now, better we can follow the hive behavior in 
> this scenario.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24812) Last Access Time in the table description is not valid

2018-07-16 Thread Sujith (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-24812:
---
Attachment: image-2018-07-16-15-37-28-896.png

> Last Access Time in the table description is not valid
> --
>
> Key: SPARK-24812
> URL: https://issues.apache.org/jira/browse/SPARK-24812
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.1
>Reporter: Sujith
>Priority: Minor
> Attachments: image-2018-07-16-15-37-28-896.png
>
>
> Last Access Time in the table description is not valid, 
> Test steps:
> Step 1 -  create a table
> Step 2 - Run  command "DESC FORMATTED table"
>  Last Access Time will always displayed wrong date
> Wed Dec 31 15:59:59 PST 1969 - which is wrong.
> In hive its displayed as "UNKNOWN" which makes more sense than displaying 
> wrong date.
> Seems to be a limitation as of now, better we can follow the hive behavior in 
> this scenario.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24812) Last Access Time in the table description is not valid

2018-07-15 Thread Sujith (JIRA)

Sujith created SPARK-24812:
--

 Summary: Last Access Time in the table description is not valid
 Key: SPARK-24812
 URL: https://issues.apache.org/jira/browse/SPARK-24812
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1, 2.2.1
Reporter: Sujith


Last Access Time in the table description is not valid, 

Test steps:

Step 1 -  create a table

Step 2 - Run  command "DESC FORMATTED table"

 Last Access Time will always displayed wrong date

Wed Dec 31 15:59:59 PST 1969 - which is wrong.

In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong 
date.

Seems to be a limitation as of now, better we can follow the hive behavior in 
this scenario.

 

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-02-14 Thread Sujith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364538#comment-16364538
 ] 

Sujith commented on SPARK-23425:


I am working towards resolving this bug, please let me know for any suggestions 
or valuable feedback

> load data for hdfs file path with wild card usage is not working properly
> -
>
> Key: SPARK-23425
> URL: https://issues.apache.org/jira/browse/SPARK-23425
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Sujith
>Priority: Major
> Attachments: wildcard_issue.PNG
>
>
> load data command  for loading data from non local  file paths by using wild 
> card strings lke * are not working
> eg:
> "load data inpath 'hdfs://hacluster/user/ext*  into table t1"
> Getting Analysis excepton while executing this query
> !image-2018-02-14-23-41-39-923.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-02-14 Thread Sujith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-23425:
---
Attachment: wildcard_issue.PNG

> load data for hdfs file path with wild card usage is not working properly
> -
>
> Key: SPARK-23425
> URL: https://issues.apache.org/jira/browse/SPARK-23425
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1, 2.3.0
>Reporter: Sujith
>Priority: Major
> Attachments: wildcard_issue.PNG
>
>
> load data command  for loading data from non local  file paths by using wild 
> card strings lke * are not working
> eg:
> "load data inpath 'hdfs://hacluster/user/ext*  into table t1"
> Getting Analysis excepton while executing this query
> !image-2018-02-14-23-41-39-923.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly

2018-02-14 Thread Sujith (JIRA)

Sujith created SPARK-23425:
--

 Summary: load data for hdfs file path with wild card usage is not 
working properly
 Key: SPARK-23425
 URL: https://issues.apache.org/jira/browse/SPARK-23425
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.1, 2.3.0
Reporter: Sujith


load data command  for loading data from non local  file paths by using wild 
card strings lke * are not working

eg:

"load data inpath 'hdfs://hacluster/user/ext*  into table t1"

Getting Analysis excepton while executing this query

!image-2018-02-14-23-41-39-923.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2017-11-25 Thread Sujith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265748#comment-16265748
 ] 

Sujith commented on SPARK-22601:


sure Sean.

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2017-11-24 Thread Sujith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-22601:
---
Issue Type: Bug  (was: Improvement)

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2017-11-24 Thread Sujith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265264#comment-16265264
 ] 

Sujith edited comment on SPARK-22601 at 11/24/17 1:32 PM:
--

I think  in this scenario  we should validate the hdfs path to check whether 
its existing , as i noticed we are validating and throwing exception in case of 
local non existing file path. if we wont validate then then this can mislead 
the user and also this creates an inconsistency in the load command behaviour 
with local and hdfs path. i am working on this issue, will raise PR asap.


was (Author: s71955):
I think  in this scenario  we should validate the hdfs path to check whether 
its existing , as i noticed we are validating and throwing exception in case of 
local non existing file path. i am working on this issue, will raise PR asap.

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2017-11-24 Thread Sujith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265264#comment-16265264
 ] 

Sujith edited comment on SPARK-22601 at 11/24/17 1:14 PM:
--

I think  in this scenario  we should validate the hdfs path to check whether 
its existing , as i noticed we are validating and throwing exception in case of 
local non existing file path. i am working on this issue, will raise PR asap.


was (Author: s71955):
I am working on this issue.

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2017-11-24 Thread Sujith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265264#comment-16265264
 ] 

Sujith commented on SPARK-22601:


I am working on this issue.

> Data load is getting displayed successful on providing non existing hdfs file 
> path
> --
>
> Key: SPARK-22601
> URL: https://issues.apache.org/jira/browse/SPARK-22601
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Sujith
>Priority: Minor
>
> Data load is getting displayed successful on providing non existing hdfs file 
> path where as in local path proper error message is getting displayed
> create table tb2 (a string, b int);
>  load data inpath 'hdfs://hacluster/data1.csv' into table tb2
> Note:  data1.csv does not exist in HDFS
> when local non existing file path is given below error message will be 
> displayed
> "LOAD DATA input path does not exist". attached snapshots of behaviour in 
> spark 2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path

2017-11-24 Thread Sujith (JIRA)

Sujith created SPARK-22601:
--

 Summary: Data load is getting displayed successful on providing 
non existing hdfs file path
 Key: SPARK-22601
 URL: https://issues.apache.org/jira/browse/SPARK-22601
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: Sujith
Priority: Minor


Data load is getting displayed successful on providing non existing hdfs file 
path where as in local path proper error message is getting displayed

create table tb2 (a string, b int);
 load data inpath 'hdfs://hacluster/data1.csv' into table tb2
Note:  data1.csv does not exist in HDFS

when local non existing file path is given below error message will be displayed
"LOAD DATA input path does not exist". attached snapshots of behaviour in spark 
2.1 and spark 2.2 version



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20380) describe table not showing updated table comment after alter operation

2017-04-18 Thread Sujith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-20380:
---
Description: 
When user alter the table properties and adds/updates table comment. table 
comment which is now directly part of CatalogTable instance is not getting 
updated and old table comment was shown

Proposal for solution
To handle this issue while updating the table properties map with newly 
added/modified properties in CatalogTable
instance also update the comment parameter in CatalogTable with the newly 
added/modified comment. already a PR is added for this issue

https://github.com/apache/spark/pull/17649

  was:
When user alter the table properties and adds/updates table comment. table 
comment which is now directly part of CatalogTable instance is not getting 
updated and old table comment was shown

Proposal for solution
To handle this issue while updating the table properties map with newly 
added/modified properties in CatalogTable
instance also update the comment parameter in CatalogTable with the newly 
added/modified comment.


> describe table not showing updated table comment after alter operation
> --
>
> Key: SPARK-20380
> URL: https://issues.apache.org/jira/browse/SPARK-20380
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Sujith
>
> When user alter the table properties and adds/updates table comment. table 
> comment which is now directly part of CatalogTable instance is not getting 
> updated and old table comment was shown
> Proposal for solution
> To handle this issue while updating the table properties map with newly 
> added/modified properties in CatalogTable
> instance also update the comment parameter in CatalogTable with the newly 
> added/modified comment. already a PR is added for this issue
> https://github.com/apache/spark/pull/17649



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20380) describe table not showing updated table comment after alter operation

2017-04-18 Thread Sujith (JIRA)

Sujith created SPARK-20380:
--

 Summary: describe table not showing updated table comment after 
alter operation
 Key: SPARK-20380
 URL: https://issues.apache.org/jira/browse/SPARK-20380
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Sujith


When user alter the table properties and adds/updates table comment. table 
comment which is now directly part of CatalogTable instance is not getting 
updated and old table comment was shown

Proposal for solution
To handle this issue while updating the table properties map with newly 
added/modified properties in CatalogTable
instance also update the comment parameter in CatalogTable with the newly 
added/modified comment.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20023) Can not see table comment when describe formatted table

2017-04-16 Thread Sujith (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970365#comment-15970365
 ] 

Sujith commented on SPARK-20023:


@chenerlu, your point is right, after executing the alter command with newly 
added/modified table comment, the same is not reflecting when we execute desc 
formatted table query.
table comment which is now directly part of CatalogTable instance is not 
getting updated and old table comment was shown, to handle this issue while 
updating the table properties map with newly added/modified properties in 
CatalogTable instance also update the comment parameter in CatalogTable with 
the newly added/modified comment.

I raised PR after fixing this issue, https://github.com/apache/spark/pull/17649
 Please let me know for any suggestions.

> Can not see table comment when describe formatted table
> ---
>
> Key: SPARK-20023
> URL: https://issues.apache.org/jira/browse/SPARK-20023
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: chenerlu
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> Spark 2.x implements create table by itself.
> https://github.com/apache/spark/commit/7d2ed8cc030f3d84fea47fded072c320c3d87ca7
> But in the implement mentioned above, it remove table comment from 
> properties, so user can not see table comment through run "describe formatted 
> table". Similarly, when user alters table comment, he still can not see the 
> change of table comment through run "describe formatted table".
> I wonder why we removed table comments, is this a bug?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19222) Limit Query Performance issue

2017-01-18 Thread Sujith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-19222:
---
Description: 
When limit is being added in the middle of the physical plan there will 
be possibility of memory bottleneck 
if the limit value is too large and system will try to aggregate all the 
partition limit values as part of single partition. 
Description: 
Eg: 
create table src_temp as select * from src limit n;(n=1000) 

== Physical Plan  == 
ExecutedCommand 
   +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, 
InsertIntoHiveTable] 
 +- GlobalLimit 1000 
+- LocalLimit 1000 
   +- Project [imei#101, age#102, task#103L, num#104, level#105, 
productdate#106, name#107, point#108] 
  +- SubqueryAlias hive 
 +- 
Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108]
 csv  |

As shown in above plan when the limit comes in middle,there can be two 
types of performance bottlenecks. 
scenario 1: when the partition count is very high and limit value is small 
scenario 2: when the limit value is very large 


Eg,current scenario based on following sample data of limit count is 1000 
and partition count  5 

Local Limit  > |partition 1||partition 2||partition 3||partition 
4||partition 5|
  -->  <><><><><>
|
 Shuffle Exchange(into single 
partition) 
|
 Global Limit  ><< take n>>  (all the partition 
data will be grouped in single partition)   
  
as the above scenario occurs where system will shuffle and try to group the 
limit data from all partition 
to single partition which will induce performance bottleneck. 

  was:
When limit is being added in the middle of the physical plan there will 
be possibility of memory bottleneck 
if the limit value is too large and system will try to aggregate all the 
partition limit values as part of single partition. 
Description: 
Eg: 
create table src_temp as select * from src limit n;(n=1000) 

== Physical Plan  == 
ExecutedCommand 
   +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, 
InsertIntoHiveTable] 
 +- GlobalLimit 1000 
+- LocalLimit 1000 
   +- Project [imei#101, age#102, task#103L, num#104, level#105, 
productdate#106, name#107, point#108] 
  +- SubqueryAlias hive 
 +- 
Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108]
 csv  |

As shown in above plan when the limit comes in middle,there can be two 
types of performance bottlenecks. 
scenario 1: when the partition count is very high and limit value is small 
scenario 2: when the limit value is very large 


Eg,current scenario based on following sample data of limit count is 1000 
and partition count  5 

Local Limit  > |partition 1|   |partition 2|   |partition 3|   
|partition 4|   |partition 5|
   take n   take n   take n 
take n  take n 
  
 Shuffle Exchange(single partition) 

Global Limit  >  take n  (all the partition data 
will be grouped in single partition)   
  
as the above scenario occurs where system will shuffle and try to group the 
limit data from all partition 
to single partition which will induce performance bottleneck. 


> Limit Query Performance issue
> -
>
> Key: SPARK-19222
> URL: https://issues.apache.org/jira/browse/SPARK-19222
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Linux/Windows
>Reporter: Sujith
>Priority: Minor
>
> When limit is being added in the middle of the physical plan there will 
> be possibility of memory bottleneck 
> if the limit value is too large and system will try to aggregate all the 
> partition limit values as part of single partition. 
> Description: 
> Eg: 
> create table src_temp as select * from src limit n;(n=1000) 
> == Physical Plan  == 
> ExecutedCommand 
>+- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, 
> InsertIntoHiveTable] 
>  +- GlobalLimit 1000 
> +- LocalLimit 1000 
>+- Project [imei#101, age#102, task#103L, num#104, level#105, 
> productdate#106, name#107, point#108] 
>   +- SubqueryAlias hive 
>

[jira] [Updated] (SPARK-19222) Limit Query Performance issue

2017-01-18 Thread Sujith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sujith updated SPARK-19222:
---
Description: 
When limit is being added in the middle of the physical plan there will 
be possibility of memory bottleneck 
if the limit value is too large and system will try to aggregate all the 
partition limit values as part of single partition. 
Description: 
Eg: 
create table src_temp as select * from src limit n;(n=1000) 

== Physical Plan  == 
ExecutedCommand 
   +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, 
InsertIntoHiveTable] 
 +- GlobalLimit 1000 
+- LocalLimit 1000 
   +- Project [imei#101, age#102, task#103L, num#104, level#105, 
productdate#106, name#107, point#108] 
  +- SubqueryAlias hive 
 +- 
Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108]
 csv  |

As shown in above plan when the limit comes in middle,there can be two 
types of performance bottlenecks. 
scenario 1: when the partition count is very high and limit value is small 
scenario 2: when the limit value is very large 


Eg,current scenario based on following sample data of limit count is 1000 
and partition count  5 

Local Limit  > |partition 1|   |partition 2|   |partition 3|   
|partition 4|   |partition 5|
   take n   take n   take n 
take n  take n 
  
 Shuffle Exchange(single partition) 

Global Limit  >  take n  (all the partition data 
will be grouped in single partition)   
  
as the above scenario occurs where system will shuffle and try to group the 
limit data from all partition 
to single partition which will induce performance bottleneck. 

  was:
When limit is being added in the middle of the physical plan there will 
be possibility of memory bottleneck 
if the limit value is too large and system will try to aggregate all the 
partition limit values as part of single partition. 
Description: 
Eg: 
create table src_temp as select * from src limit n;(n=1000) 

== Physical Plan  == 
ExecutedCommand 
   +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, 
InsertIntoHiveTable] 
 +- GlobalLimit 1000 
+- LocalLimit 1000 
   +- Project [imei#101, age#102, task#103L, num#104, level#105, 
productdate#106, name#107, point#108] 
  +- SubqueryAlias hive 
 +- 
Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108]
 csv  |

As shown in above plan when the limit comes in middle,there can be two 
types of performance bottlenecks. 
scenario 1: when the partition count is very high and limit value is small 
scenario 2: when the limit value is very large 


Eg,current scenario based on following sample data of limit count is 1000 
and partition count  5 

Local Limit  > |partition 1|   |partition 2|   |partition 3|   
|partition 4|   |partition 5|
take n   take n   take n
 take n  take n 
  
 Shuffle Exchange(single partition) 

Global Limit  >  take n  (all the partition data 
will be grouped in single partition)   
  
as the above scenario occurs where system will shuffle and try to group the 
limit data from all partition 
to single partition which will induce performance bottleneck. 


> Limit Query Performance issue
> -
>
> Key: SPARK-19222
> URL: https://issues.apache.org/jira/browse/SPARK-19222
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: Linux/Windows
>Reporter: Sujith
>Priority: Minor
>
> When limit is being added in the middle of the physical plan there will 
> be possibility of memory bottleneck 
> if the limit value is too large and system will try to aggregate all the 
> partition limit values as part of single partition. 
> Description: 
> Eg: 
> create table src_temp as select * from src limit n;(n=1000) 
> == Physical Plan  == 
> ExecutedCommand 
>+- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, 
> InsertIntoHiveTable] 
>  +- GlobalLimit 1000 
> +- LocalLimit 1000 
>+- Project [imei#101, age#102, task#103L, num#104, level#105, 
> productdate#106, name#107, point#108] 
>   +- SubqueryAlias hive 
>  +-

1 2 >

1 - 100 of 104 matches

Mail list logo