[jira] [Commented] (SPARK-27036) Even Broadcast thread is timed out, BroadCast Job is not aborted.
[ https://issues.apache.org/jira/browse/SPARK-27036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782840#comment-16782840 ] Sujith commented on SPARK-27036: It seems to be the problem area is BroadcastExchangeExec in driver where as part of Future a particular job will be fired and collected data will be broadcasted. The main problem is system will submit the job and its respective stage/tasks through DAGScheduler, where the scheduler thread will schedule the respective events , In BroadcastExchangeExec when future time out happens respective exception will thrown but the jobs/task which is scheduled by the DAGScheduler as part of the action called in future will not be cancelled, I think we shall cancel the respective job to avoid running the same in background even after Future time out exception, this can help to terminate the job promptly when TimeOutException happens, this will also save the additional resources getting utilized even after timeout exception thrown from driver. I want to give an attempt to handle this issue, Any comments suggestions are welcome. cc [~sro...@scient.com] [~b...@cloudera.com] [~hvanhovell] > Even Broadcast thread is timed out, BroadCast Job is not aborted. > - > > Key: SPARK-27036 > URL: https://issues.apache.org/jira/browse/SPARK-27036 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Babulal >Priority: Minor > Attachments: image-2019-03-04-00-38-52-401.png, > image-2019-03-04-00-39-12-210.png, image-2019-03-04-00-39-38-779.png > > > During broadcast table job is execution if broadcast timeout > (spark.sql.broadcastTimeout) happens ,broadcast Job still continue till > completion whereas it should abort on broadcast timeout. > Exception is thrown in console but Spark Job is still continue. > > !image-2019-03-04-00-39-38-779.png! > !image-2019-03-04-00-39-12-210.png! > > wait for some time > !image-2019-03-04-00-38-52-401.png! > !image-2019-03-04-00-34-47-884.png! > > How to Reproduce Issue > Option1 using SQL:- > create Table t1(Big Table,1M Records) > val rdd1=spark.sparkContext.parallelize(1 to 100,100).map(x=> > ("name_"+x,x%3,x)) > val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as > c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as > c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as > c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as > c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as > c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30") > df.write.csv("D:/data/par1/t4"); > spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t4')"); > create Table t2(Small Table,100K records) > val rdd1=spark.sparkContext.parallelize(1 to 10,100).map(x=> > ("name_"+x,x%3,x)) > val df=rdd1.toDF.selectExpr("_1 as name","_2 as age","_3 as sal","_1 as > c1","_1 as c2","_1 as c3","_1 as c4","_1 as c5","_1 as c6","_1 as c7","_1 as > c8","_1 as c9","_1 as c10","_1 as c11","_1 as c12","_1 as c13","_1 as > c14","_1 as c15","_1 as c16","_1 as c17","_1 as c18","_1 as c19","_1 as > c20","_1 as c21","_1 as c22","_1 as c23","_1 as c24","_1 as c25","_1 as > c26","_1 as c27","_1 as c28","_1 as c29","_1 as c30") > df.write.csv("D:/data/par1/t4"); > spark.sql("create table csv_2 using csv options('path'='D:/data/par1/t5')"); > spark.sql("set spark.sql.autoBroadcastJoinThreshold=73400320").show(false) > spark.sql("set spark.sql.broadcastTimeout=2").show(false) > Run Below Query > spark.sql("create table s using parquet as select t1.* from csv_2 as > t1,csv_1 as t2 where t1._c3=t2._c3") > Option 2:- Use External DataSource and Add Delay in the #buildScan. and use > datasource for query. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26969) [Spark] Using ODBC not able to see the data in table when datatype is decimal
[ https://issues.apache.org/jira/browse/SPARK-26969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774994#comment-16774994 ] Sujith commented on SPARK-26969: i will further analyze the issue and raise a PR if required. thanks > [Spark] Using ODBC not able to see the data in table when datatype is decimal > - > > Key: SPARK-26969 > URL: https://issues.apache.org/jira/browse/SPARK-26969 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > # Using odbc rpm file install odbc > # connect to odbc using isql -v spark2xsingle > # SQL> create table t1_t(id decimal(15,2)); > # SQL> insert into t1_t values(15); > # > SQL> select * from t1_t; > +-+ > | id | > +-+ > +-+ Actual output is empty > Note: When creating table of int data type select is giving result as below > SQL> create table test_t1(id int); > SQL> insert into test_t1 values(10); > SQL> select * from test_t1; > ++ > | id | > ++ > | 10 | > ++ > Needs to handle for decimal case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16772921#comment-16772921 ] Sujith commented on SPARK-22601: *[gatorsmile|https://github.com/gatorsmile] [~srowen] please assign this JIRA to me as already this PR is been merged. Thanks* > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > Fix For: 2.2.1 > > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761086#comment-16761086 ] Sujith edited comment on SPARK-26821 at 2/5/19 6:27 PM: Yeah with spaces it will work fine,.shall we document this behavior? , will try to check the behavior in couple of other systems also. was (Author: s71955): Yeah with spaces it will work fine, will try to check the behavior in couple of other systems also. > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761086#comment-16761086 ] Sujith commented on SPARK-26821: Yeah with spaces it will work fine, will try to check the behavior in couple of other systems also. > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760063#comment-16760063 ] Sujith commented on SPARK-26821: bit tricky to handle this scenario eventhough. > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16760061#comment-16760061 ] Sujith commented on SPARK-26821: yes sean, but same i tested with MYSQL its giving me a result. not sure how they are handling internally. > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382 ] Sujith edited comment on SPARK-26821 at 2/4/19 5:38 PM: cc [~dongjoon] [~vinodkc] [~srowen] was (Author: s71955): cc [~dongjoon] [~vinodkc] > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759381#comment-16759381 ] Sujith commented on SPARK-26821: As per the initial analysis, this phenomenon is happening because the actual char data type length is 5 where as we are trying to insert a data with length 2, since its a char data type the system will pad the remaining part of the array block with 'space'. now when we try to apply a filter, the system will try to compare the predicate value with the actual table data which contains the space char like 'ds' == 'ds ' which leads to wrong result. I am trying to analyze more on this issue please let me know for any suggestions or guidance. thanks > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382 ] Sujith commented on SPARK-26821: cc [~dongjoon] > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759382#comment-16759382 ] Sujith edited comment on SPARK-26821 at 2/3/19 11:44 AM: - cc [~dongjoon] [~vinodkc] was (Author: s71955): cc [~dongjoon] > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26821) filters not working with char datatype when querying against hive table
[ https://issues.apache.org/jira/browse/SPARK-26821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26821: --- Description: creates a table with a char type field, While inserting data to char data type column, if the data string length is less than the specified datatype length, spark2x will not process filter query properly leading to incorrect result . 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name char(5)); +--+-+ |Result| +--+-+ +--+-+ No rows selected (0.894 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj values(232,'ds'); +--+-+ |Result| +--+-+ +--+-+ No rows selected (1.815 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; +--+--++-- |id|name| +--+--++-- +--+--++-- The above query will not give any result. was: creates a table with a char type field, While inserting data to char data type column, if the data string length is less than the specified datatype length, spark2x will not process filter query properly leading to incorrect result . 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name char(5)); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.894 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj values(232,'ds'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.815 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; +-+---+--+ | id | name | +-+---+--+ +-+---+--+ > filters not working with char datatype when querying against hive table > --- > > Key: SPARK-26821 > URL: https://issues.apache.org/jira/browse/SPARK-26821 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Sujith >Priority: Major > > creates a table with a char type field, While inserting data to char data > type column, if the data string length is less than the specified datatype > length, spark2x will not process filter query properly leading to incorrect > result . > 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name > char(5)); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (0.894 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj > values(232,'ds'); > +--+-+ > |Result| > +--+-+ > +--+-+ > No rows selected (1.815 seconds) > 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; > +--+--++-- > |id|name| > +--+--++-- > +--+--++-- > > The above query will not give any result. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26821) filters not working with char datatype when querying against hive table
Sujith created SPARK-26821: -- Summary: filters not working with char datatype when querying against hive table Key: SPARK-26821 URL: https://issues.apache.org/jira/browse/SPARK-26821 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Sujith creates a table with a char type field, While inserting data to char data type column, if the data string length is less than the specified datatype length, spark2x will not process filter query properly leading to incorrect result . 0: jdbc:hive2://10.19.89.222:22550/default> create table jj(id int, name char(5)); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.894 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> insert into table jj values(232,'ds'); +-+--+ | Result | +-+--+ +-+--+ No rows selected (1.815 seconds) 0: jdbc:hive2://10.19.89.222:22550/default> select * from jj where name='ds'; +-+---+--+ | id | name | +-+---+--+ +-+---+--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752044#comment-16752044 ] Sujith edited comment on SPARK-9 at 1/25/19 8:53 AM: - @[~yuvaldeg] May i know where i can find PR related to new [SparkRDMA|https://github.com/Mellanox/SparkRDMA] implementation. just wanted to evaluate it further, quite interesting feature was (Author: s71955): @[~yuvaldeg] May i know where i can find PR related to new [SparkRDMA|https://github.com/Mellanox/SparkRDMA] implementation. We want to evaluate it further, quite interesting feature > SPIP: RDMA Accelerated Shuffle Engine > - > > Key: SPARK-9 > URL: https://issues.apache.org/jira/browse/SPARK-9 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Yuval Degani >Priority: Major > Attachments: > SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf > > > An RDMA-accelerated shuffle engine can provide enormous performance benefits > to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin > open-source project ([https://github.com/Mellanox/SparkRDMA]). > Using RDMA for shuffle improves CPU utilization significantly and reduces I/O > processing overhead by bypassing the kernel and networking stack as well as > avoiding memory copies entirely. Those valuable CPU cycles are then consumed > directly by the actual Spark workloads, and help reducing the job runtime > significantly. > This performance gain is demonstrated with both industry standard HiBench > TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive > customer applications. > SparkRDMA will be presented at Spark Summit 2017 in Dublin > ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]). > Please see attached proposal document for more information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752044#comment-16752044 ] Sujith commented on SPARK-9: @[~yuvaldeg] May i know where i can find PR related to new [SparkRDMA|https://github.com/Mellanox/SparkRDMA] implementation. We want to evaluate it further, quite interesting feature > SPIP: RDMA Accelerated Shuffle Engine > - > > Key: SPARK-9 > URL: https://issues.apache.org/jira/browse/SPARK-9 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.0 >Reporter: Yuval Degani >Priority: Major > Attachments: > SPARK-9_SPIP_RDMA_Accelerated_Shuffle_Engine_Rev_1.0.pdf > > > An RDMA-accelerated shuffle engine can provide enormous performance benefits > to shuffle-intensive Spark jobs, as demonstrated in the “SparkRDMA” plugin > open-source project ([https://github.com/Mellanox/SparkRDMA]). > Using RDMA for shuffle improves CPU utilization significantly and reduces I/O > processing overhead by bypassing the kernel and networking stack as well as > avoiding memory copies entirely. Those valuable CPU cycles are then consumed > directly by the actual Spark workloads, and help reducing the job runtime > significantly. > This performance gain is demonstrated with both industry standard HiBench > TeraSort (shows 1.5x speedup in sorting) as well as shuffle intensive > customer applications. > SparkRDMA will be presented at Spark Summit 2017 in Dublin > ([https://spark-summit.org/eu-2017/events/accelerating-shuffle-a-tailor-made-rdma-solution-for-apache-spark/]). > Please see attached proposal document for more information. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732299#comment-16732299 ] Sujith commented on SPARK-26432: Test description is been updated. let me know for any suggestions or input. thnks all. > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > Test steps: > Steps to test Spark-Hbase connection > 1. Create 2 tables in hbase shell > >Launch hbase shell > >Enter commands to create tables and load data > create 'table1','cf' > put 'table1','row1','cf:cid','20' > create 'table2','cf' > put 'table2','row1','cf:cid','30' > > >Show values command > get 'table1','row1','cf:cid' will diplay value as 20 > get 'table2','row1','cf:cid' will diplay value as 30 > > > 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit > spark-submit --master yarn-cluster --class > com.mrs.example.spark.SparkHbasetoHbase --conf > "spark.yarn.security.credentials.hbase.enabled"="true" --conf > "spark.security.credentials.hbase.enabled"="true" --keytab > /opt/client/user.keytab --principal sen testSpark.jar > The SparkHbasetoHbase class will update the value of table2 with sum of > values of table1 & table2. > table2 = table1+table2 > > 3.Verify the result in hbase shell > Expected Result: The value of table2 should be 50. > get 'table1','row1','cf:cid' will diplay value as 50 > Actual Result : Not updating the value as an error will be thrown when spark > tries to connect with hbase service. > Attached the snapshot of error logs below for more details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Description: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. Test steps: Steps to test Spark-Hbase connection 1. Create 2 tables in hbase shell >Launch hbase shell >Enter commands to create tables and load data create 'table1','cf' put 'table1','row1','cf:cid','20' create 'table2','cf' put 'table2','row1','cf:cid','30' >Show values command get 'table1','row1','cf:cid' will diplay value as 20 get 'table2','row1','cf:cid' will diplay value as 30 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit spark-submit --master yarn-cluster --class com.mrs.example.spark.SparkHbasetoHbase --conf "spark.yarn.security.credentials.hbase.enabled"="true" --conf "spark.security.credentials.hbase.enabled"="true" --keytab /opt/client/user.keytab --principal sen testSpark.jar The SparkHbasetoHbase class will update the value of table2 with sum of values of table1 & table2. table2 = table1+table2 3.Verify the result in hbase shell Expected Result: The value of table2 should be 50. get 'table1','row1','cf:cid' will diplay value as 50 Actual Result : Not updating the value as an error will be thrown when spark tries to connect with hbase service. Attached the snapshot of error logs below for more details was: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. Attached the snapshot of error logs > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > Test steps: > Steps to test Spark-Hbase connection > 1. Create 2 tables in hbase shell > >Launch hbase shell > >Enter commands to create tables and load data > create 'table1','cf' > put 'table1','row1','cf:cid','20' > create 'table2','cf' > put 'table2','row1','cf:cid','30' > > >Show values command > get 'table1','row1','cf:cid' will diplay value as 20 > get 'table2','row1','cf:cid' will diplay value as 30 > > > 2.Run SparkHbasetoHbase class in testSpark.jar using spark-submit > spark-submit --master yarn-cluster --class > com.mrs.example.spark.SparkHbasetoHbase --conf > "spark.yarn.security.credentials.hbase.enabled"="true" --conf > "spark.security.credentials.hbase.enabled"="true" --keytab > /opt/client/user.keytab --principal sen testSpark.jar > The SparkHbasetoHbase class will update the value of table2 with sum of > values of table1 & table2. > table2 = table1+table2 > > 3.Verify the result in hbase shell > Expected Result: The value of table2 should be 50. > get 'table1','row1','cf:cid' will diplay value as 50 > Actual Result : Not updating the value as an error will be thrown when spark > tries to connect with hbase service. > Attached the snapshot of error logs below for more details -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281 ] Sujith edited comment on SPARK-26432 at 1/2/19 6:27 PM: sorry for the late response due to holidays :), raised a PR please let me know for any suggestions. thanks. PR is in WIP as i need to attach test report which i will attach tomorrow was (Author: s71955): sorry for the late response due to holidays :), raised a PR please let me know for any suggestions. thanks > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732281#comment-16732281 ] Sujith commented on SPARK-26432: sorry for the late response due to holidays :), raised a PR please let me know for any suggestions. thanks > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26454) While creating new UDF with JAR though UDF is created successfully, it throws IllegegalArgument Exception
[ https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732002#comment-16732002 ] Sujith commented on SPARK-26454: I think [~hyukjin.kwon] idea is better and simple, we can reduce level to warn, because when you say error which means user wont expect the particular operation to be successful sometimes . so to avoid confusions better to lower the error level. > While creating new UDF with JAR though UDF is created successfully, it throws > IllegegalArgument Exception > - > > Key: SPARK-26454 > URL: https://issues.apache.org/jira/browse/SPARK-26454 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.2 >Reporter: Udbhav Agrawal >Priority: Trivial > Attachments: create_exception.txt > > > 【Test step】: > 1.launch spark-shell > 2. set role admin; > 3. create new function > CREATE FUNCTION Func AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar' > 4. Do select on the function > sql("select Func('2018-03-09')").show() > 5.Create new UDF with same JAR > sql("CREATE FUNCTION newFunc AS > 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR > 'hdfs:///tmp/super_udf/two_udfs.jar'") > 6. Do select on the new function created. > sql("select newFunc ('2018-03-09')").show() > 【Output】: > Function is getting created but illegal argument exception is thrown , select > provides result but with illegal argument exception. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728463#comment-16728463 ] Sujith commented on SPARK-26432: Thanks for the suggestions. I will update the description, this issue has been reported by our customer who were trying to connect spark with Hbase 2.1 version. - HBase 2.1 is the only one broken among HBase versions? Could you link the Apache HBase issue which removes that API here? _From Hbase 2.0 this particular deprecated API obtainToken(conf) API is been removed. https://issues.apache.org/jira/browse/HBASE-14713_ - Is it enough to make `HBaseDelegationTokenProvider` support HBase 2.1 - . _we already had a consistent API obtainToken(Connection con) available from older versions of hbase for obtaining the token. if we use this consistent API we can avoid break while using hbase upgraded versions ._ I will raise a PR for handling this issue soon where i can include more details for this issue. > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Description: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. Attached the snapshot of error logs was: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728014#comment-16728014 ] Sujith edited comment on SPARK-26432 at 12/23/18 5:49 PM: -- This is mainly happening because spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the kerberos security token and the API has been removed from hbase 2.1 version , as i analyzed there is one more stable API in public static Token obtainToken(Connection conn) in TokenUtil class , i think spark shall use this stable api for getting the delegation token. To invoke this api first connection object has to be retrieved from ConnectionFactory and the same connection can be passed to obtainToken(Connection conn) for getting token. I can raise a PR soon for handling this issue, please let me know for any clarifications or suggestions. was (Author: s71955): This is mainly happening because spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the kerberos security token and the API has been removed from hbase 2.1 version , as i analyzed there is one more stable API in public static Token obtainToken(Connection conn) in TokenUtil class , i think spark shall use this stable api for getting the delegation token. To invoke this api first connection object has to be retrieved from ConnectionFactory and the same connection can be passed to obtainToken(Connection conn) for getting token. I can rase a PR soon for handling this issue, please let me know for any clarifications or suggestions. > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728014#comment-16728014 ] Sujith edited comment on SPARK-26432 at 12/23/18 5:48 PM: -- This is mainly happening because spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the kerberos security token and the API has been removed from hbase 2.1 version , as i analyzed there is one more stable API in public static Token obtainToken(Connection conn) in TokenUtil class , i think spark shall use this stable api for getting the delegation token. To invoke this api first connection object has to be retrieved from ConnectionFactory and the same connection can be passed to obtainToken(Connection conn) for getting token. I can rase a PR soon for handling this issue, please let me know for any clarifications or suggestions. was (Author: s71955): This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the kerberos security token and the API has been removed from hbase 2.1 version , as i analyzed there is one more stable API in public static Token obtainToken(Connection conn) in TokenUtil class , i think spark shall use this stable api for getting the delegation token. To invoke this api first connection object has to be retrieved from ConnectionFactory and the same connection can be passed to obtainToken(Connection conn) for getting token. I can rase a PR soon for handling this issue, please let me know for any clarifications or suggestions. > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. > > Attached the snapshot of error logs -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service.
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Summary: Not able to connect Hbase 2.1 service Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service. (was: Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service) > Not able to connect Hbase 2.1 service Getting NoSuchMethodException while > trying to obtain token from Hbase 2.1 service. > > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728015#comment-16728015 ] Sujith commented on SPARK-26432: cc [~cloud_fan] [~vanzin] > Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 > service > - > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16728014#comment-16728014 ] Sujith commented on SPARK-26432: This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the kerberos security token and the API has been removed from hbase 2.1 version , as i analyzed there is one more stable API in public static Token obtainToken(Connection conn) in TokenUtil class , i think spark shall use this stable api for getting the delegation token. To invoke this api first connection object has to be retrieved from ConnectionFactory and the same connection can be passed to obtainToken(Connection conn) for getting token. I can rase a PR soon for handling this issue, please let me know for any clarifications or suggestions. > Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 > service > - > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Description: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api public static Token obtainToken(Configuration conf) for obtaining the token and the same has been removed from hbase 2.1 version. was: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api for obtaining the token and the same has been removed from hbase 2.1 version > Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 > service > - > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api > public static Token obtainToken(Configuration > conf) > for obtaining the token and the same has been removed from hbase 2.1 version. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Attachment: hbase-dep-obtaintok.png > Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 > service > - > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark we were using a deprecated hbase > api for obtaining the token and the same has been removed from hbase 2.1 > version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service
[ https://issues.apache.org/jira/browse/SPARK-26432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26432: --- Description: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark uses a deprecated hbase api for obtaining the token and the same has been removed from hbase 2.1 version was: Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark we were using a deprecated hbase api for obtaining the token and the same has been removed from hbase 2.1 version > Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 > service > - > > Key: SPARK-26432 > URL: https://issues.apache.org/jira/browse/SPARK-26432 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: hbase-dep-obtaintok.png > > > Getting NoSuchMethodException : > org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) > while trying connect hbase 2.1 service from spark. > This is mainly happening because in spark uses a deprecated hbase api for > obtaining the token > and the same has been removed from hbase 2.1 version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26432) Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service
Sujith created SPARK-26432: -- Summary: Getting NoSuchMethodException while trying to obtain token from Hbase 2.1 service Key: SPARK-26432 URL: https://issues.apache.org/jira/browse/SPARK-26432 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.0, 2.3.2 Reporter: Sujith Getting NoSuchMethodException : org.apache.hadoop.hbase.security.token.TokenUtil(org.apache.hadoop.conf.Configuration) while trying connect hbase 2.1 service from spark. This is mainly happening because in spark we were using a deprecated hbase api for obtaining the token and the same has been removed from hbase 2.1 version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705935#comment-16705935 ] Sujith commented on SPARK-26165: CC [~marmbrus] [~yhuai] - PLEASE LET ME KNOW YOUR SUGGESTIONS > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: image-2018-11-26-13-00-36-896.png, > image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700459#comment-16700459 ] Sujith commented on SPARK-26165: [~srowen] i can also raise a PR for this issue so that even reviewers can get a complete insight about the problem and solution, or even i can wait for Yins and Micheal confirmation. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: image-2018-11-26-13-00-36-896.png, > image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698469#comment-16698469 ] Sujith edited comment on SPARK-26165 at 11/26/18 7:31 AM: -- This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] ; where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. There are some testcases also present which similar usage and we are casting the type to string !image-2018-11-26-13-01-28-299.png! i thought to just improvise the logic as per my above description. this issue we met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. was (Author: s71955): This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] ; where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. There are some testcases also present which similar usage and we are casting the type to string !image-2018-11-26-13-01-28-299.png! i thought to just improvise the logic as per my above description. if not a valid use-case then we can close. this issue i met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: image-2018-11-26-13-00-36-896.png, > image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698469#comment-16698469 ] Sujith edited comment on SPARK-26165 at 11/26/18 7:30 AM: -- This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] ; where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. There are some testcases also present which similar usage and we are casting the type to string !image-2018-11-26-13-01-28-299.png! i thought to just improvise the logic as per my above description. if not a valid use-case then we can close. this issue i met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. was (Author: s71955): This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] ; where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. i thought to just improvise the logic as per my above description. if not a valid use-case then we can close. this issue i met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: image-2018-11-26-13-00-36-896.png, > image-2018-11-26-13-01-28-299.png, timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698469#comment-16698469 ] Sujith commented on SPARK-26165: This change is been done as part of PR "[https://github.com/apache/spark/pull/6888] ; where we introduced string casting for if left/right expression type is TimeStamp, For equality cases we were implicitly casting the right/left side string type expressions to TimeStamp. i thought to just improvise the logic as per my above description. if not a valid use-case then we can close. this issue i met in our customer environment where they reported filter query is slow, after doing an initial analysis i came to know we were casting the TimeStamp column expression to string. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in t
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698259#comment-16698259 ] Sujith commented on SPARK-26165: Sure Sean, so you mean user shall explicitly cast and no need to handle this implicitly by the system. Actually i thought to take advantage of stringToTimestamp() method in org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class for handling this scenario. Fine, so you want me to close this Jira? > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698259#comment-16698259 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:32 PM: -- Sure Sean, so you mean user shall explicitly cast and no need to handle this implicitly by the system. Actually i thought it will be pretty easy and risk free to handle these scenario as we already have a method stringToTimestamp() method in org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class from long time. Fine, so you want me to close this Jira? was (Author: s71955): Sure Sean, so you mean user shall explicitly cast and no need to handle this implicitly by the system. Actually i thought it will be pretty easy to handle these scenario as we already have a method stringToTimestamp() method in org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class. Fine, so you want me to close this Jira? > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698259#comment-16698259 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:32 PM: -- Sure Sean, so you mean user shall explicitly cast and no need to handle this implicitly by the system. Actually i thought it will be pretty easy to handle these scenario as we already have a method stringToTimestamp() method in org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class. Fine, so you want me to close this Jira? was (Author: s71955): Sure Sean, so you mean user shall explicitly cast and no need to handle this implicitly by the system. Actually i thought to take advantage of stringToTimestamp() method in org.apache.spark.sql.catalyst.util.DateTimeUtils.scala class for handling this scenario. Fine, so you want me to close this Jira? > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:17 PM: -- I think we shall avoid casting to string in the cases where filter condition literals of string type value can generate a valid date/timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of casting to string type. This approach can avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string type as mentioned in JIRA. I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] was (Author: s71955): I think we shall avoid casting to string in the cases where filter condition literals of string type value can generate a valid date/timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of casting to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string type as mentioned in JIRA. I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:16 PM: -- I think we shall avoid casting to string in the cases where filter condition literals of string type value can generate a valid date/timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of casting to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string type as mentioned in JIRA. I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] was (Author: s71955): I think we shall avoid casting to string in the cases where filter condition literals string type value can generate a valid date or timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of cast to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:15 PM: -- I think we shall avoid casting to string in the cases where filter condition literals string type value can generate a valid date or timestamp, like the filter condition mentioned in jira ,otherwise we can fallback to the current logic of cast to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] was (Author: s71955): I think we shall avoid casting to string in the cases like if Date/timestamp string can be converted to a valid date or timestamp like the condition mentioned in jira ,otherwise we can fallback to the current logic of cast to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:11 PM: -- I think we shall avoid casting to string in the cases like if Date/timestamp string can be converted to a valid date or timestamp like the condition mentioned in jira ,otherwise we can fallback to the current logic of cast to string type. This approach can also avoid the unnecessary overhead of casting the left filter column expression timestamp/date type values to string I wll raise a PR for handle this issue.. please let me know for any suggestions. cc [~srowen] [~cloud_fan] [~vinodkc] was (Author: s71955): I think we shall avoid casting to string in the cases like if Date/timestamp string can be converted to a valid date or timestamp like the condition mentioned in jira ,otherwise we can cast to string type as per current logic. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:59 PM: -- I think we shall avoid casting to string in the cases like above,if Date/timestamp string can be converted to a valid date or timestamp ,otherwise we can cast to string type as per current logic. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can cast to string type, only if filter expression with string literal of expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 7:00 PM: -- I think we shall avoid casting to string in the cases like if Date/timestamp string can be converted to a valid date or timestamp like the condition mentioned in jira ,otherwise we can cast to string type as per current logic. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. was (Author: s71955): I think we shall avoid casting to string in the cases like above,if Date/timestamp string can be converted to a valid date or timestamp ,otherwise we can cast to string type as per current logic. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:57 PM: -- I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can cast to string type, only if filter expression with string literal of expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal of right expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:35 PM: -- I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal of right expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal of left expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:30 PM: -- I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal cannot be converted to data/timestamp. I wll raise a PR for handle this issue.. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp , if the filter expression is string literal value is not a valid type then we can convert the filter right expression column to sting type as per current logic. I wll raise a PR for handle this issue.. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Summary: Date and Timestamp column expression is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side filter expression (was: Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side filter expression) > Date and Timestamp column expression is getting converted to string in less > than/greater than filter query even though valid date/timestamp string > literal is used in the right side filter expression > -- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:35 PM: -- I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal of right expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. please let me know for any suggestions. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal of right expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:34 PM: -- I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal of left expression cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the rig
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith edited comment on SPARK-26165 at 11/25/18 6:33 PM: -- I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal cannot be converted to data/timestamp. cc [~srowen] [~cloud_fan] [~vinodkc] I wll raise a PR for handle this issue.. was (Author: s71955): I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp ,we can convert the filter right expression column to sting type only if filter expression with string literal cannot be converted to data/timestamp. I wll raise a PR for handle this issue.. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right si
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698227#comment-16698227 ] Sujith commented on SPARK-26165: I think we shall avoid casting if Date/timestamp string which can be converted to a valid date or timestamp , if the filter expression is string literal value is not a valid type then we can convert the filter right expression column to sting type as per current logic. I wll raise a PR for handle this issue.. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Description: Date and Timestamp column is getting converted to string in less than/greater than filter query even though date strings that contains a time, like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a timestamp. scala> spark.sql("""explain extended SELECT username FROM orders WHERE order_creation_date > '2017-02-26 13:45:12'""").show(false); +--- |== Parsed Logical Plan == 'Project ['username] +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) +- 'UnresolvedRelation `orders` == Analyzed Logical Plan == username: string Project [username#59] +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) +- SubqueryAlias orders +- HiveTableRelation `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, order_creation_date#60, amount#61] == Optimized Logical Plan == Project [username#59] +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) +- HiveTableRelation `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, order_creation_date#60, amount#61] == Physical Plan == *(1) Project [username#59] +- *(1) Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, order_creation + - was: Date and Timestamp column is getting converted to string in less than/greater than filter query even though date strings that contains a time, like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a timestamp. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, >
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Attachment: timestamp_filter_perf.PNG > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Attachment: (was: timestamp_filter_perf.PNG) > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Attachment: timestamp_filter_perf.PNG > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: timestamp_filter_perf.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > scala> spark.sql("""explain extended SELECT username FROM orders WHERE > order_creation_date > '2017-02-26 13:45:12'""").show(false); > +--- > |== Parsed Logical Plan == > 'Project ['username] > +- 'Filter ('order_creation_date > 2017-02-26 13:45:12) > +- 'UnresolvedRelation `orders` > == Analyzed Logical Plan == > username: string > Project [username#59] > +- Filter (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12) > +- SubqueryAlias orders > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Optimized Logical Plan == > Project [username#59] > +- Filter (isnotnull(order_creation_date#60) && (cast(order_creation_date#60 > as string) > 2017-02-26 13:45:12)) > +- HiveTableRelation `default`.`orders`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [username#59, > order_creation_date#60, amount#61] > == Physical Plan == > *(1) Project [username#59] > +- *(1) Filter (isnotnull(order_creation_date#60) && > (cast(order_creation_date#60 as string) > 2017-02-26 13:45:12)) > +- HiveTableScan [order_creation_date#60, username#59], HiveTableRelation > `default`.`orders`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, > [username#59, order_creation > + > - -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Attachment: (was: testreport.PNG) > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Description: Date and Timestamp column is getting converted to string in less than/greater than filter query even though date strings that contains a time, like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a timestamp. was:Date and Timestamp column is getting converted to string in less than/greater than filter query even though date strings that contains a time, like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a timestamp. > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
[ https://issues.apache.org/jira/browse/SPARK-26165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-26165: --- Attachment: testreport.PNG > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though valid date/timestamp string literal is used in > the right side filter expression > --- > > Key: SPARK-26165 > URL: https://issues.apache.org/jira/browse/SPARK-26165 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 2.3.2, 2.4.0 >Reporter: Sujith >Priority: Major > Attachments: testreport.PNG > > > Date and Timestamp column is getting converted to string in less than/greater > than filter query even though date strings that contains a time, like > '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string > like '2018-03-18 12:39:40' to a timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26165) Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side
Sujith created SPARK-26165: -- Summary: Date and Timestamp column is getting converted to string in less than/greater than filter query even though valid date/timestamp string literal is used in the right side filter expression Key: SPARK-26165 URL: https://issues.apache.org/jira/browse/SPARK-26165 Project: Spark Issue Type: Improvement Components: Optimizer Affects Versions: 2.4.0, 2.3.2 Reporter: Sujith Date and Timestamp column is getting converted to string in less than/greater than filter query even though date strings that contains a time, like '2018-03-18" 12:39:40' to date. Besides it's not possible to cast a string like '2018-03-18 12:39:40' to a timestamp. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25332) Instead of broadcast hash join ,Sort merge join has selected when restart spark-shell/spark-JDBC for hive provider
[ https://issues.apache.org/jira/browse/SPARK-25332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16652914#comment-16652914 ] Sujith commented on SPARK-25332: [~Bjangir] i think you are right, there is a bug while inserting data into table when we use stored by clause in create command, I am working on it. soon i will be raising a PR . [~maropu] *[srowen|https://github.com/srowen] [cloud-fan|https://github.com/cloud-fan]* i will raise a PR to handle this and keep you guys in loop. thanks > Instead of broadcast hash join ,Sort merge join has selected when restart > spark-shell/spark-JDBC for hive provider > --- > > Key: SPARK-25332 > URL: https://issues.apache.org/jira/browse/SPARK-25332 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.0 >Reporter: Babulal >Priority: Major > > spark.sql("create table x1(name string,age int) stored as parquet ") > spark.sql("insert into x1 select 'a',29") > spark.sql("create table x2 (name string,age int) stored as parquet '") > spark.sql("insert into x2_ex select 'a',29") > scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain > == Physical Plan == > *{color:#14892c}(2) BroadcastHashJoin{color} [name#101], [name#103], Inner, > BuildRight > :- *(2) Project [name#101, age#102] > : +- *(2) Filter isnotnull(name#101) > : +- *(2) FileScan parquet default.x1_ex[name#101,age#102] Batched: true, > Format: Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1, > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true])) > +- *(1) Project [name#103, age#104] > +- *(1) Filter isnotnull(name#103) > +- *(1) FileScan parquet default.x2_ex[name#103,age#104] Batched: true, > Format: Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2, > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct > > > Now Restart Spark-Shell or do spark-submit orrestart JDBCServer again and > run same select query again > > scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain > scala> spark.sql("select * from x1 t1 ,x2 t2 where t1.name=t2.name").explain > == Physical Plan == > *{color:#FF}(5) SortMergeJoin [{color}name#43], [name#45], Inner > :- *(2) Sort [name#43 ASC NULLS FIRST], false, 0 > : +- Exchange hashpartitioning(name#43, 200) > : +- *(1) Project [name#43, age#44] > : +- *(1) Filter isnotnull(name#43) > : +- *(1) FileScan parquet default.x1[name#43,age#44] Batched: true, Format: > Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x1], > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct > +- *(4) Sort [name#45 ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(name#45, 200) > +- *(3) Project [name#45, age#46] > +- *(3) Filter isnotnull(name#45) > +- *(3) FileScan parquet default.x2[name#45,age#46] Batched: true, Format: > Parquet, Location: > InMemoryFileIndex[file:/D:/spark_release/spark/bin/spark-warehouse/x2], > PartitionFilters: [], PushedFilters: [IsNotNull(name)], ReadSchema: > struct > > > scala> spark.sql("desc formatted x1").show(200,false) > ++--+---+ > |col_name |data_type |comment| > ++--+---+ > |name |string |null | > |age |int |null | > | | | | > |# Detailed Table Information| | | > |Database |default | | > |Table |x1 | | > |Owner |Administrator | | > |Created Time |Sun Aug 19 12:36:58 IST 2018 | | > |Last Access |Thu Jan 01 05:30:00 IST 1970 | | > |Created By |Spark 2.3.0 | | > |Type |MANAGED | | > |Provider |hive | | > |Table Properties |[transient_lastDdlTime=1534662418] | | > |Location |file:/D:/spark_release/spark/bin/spark-warehouse/x1 | | > |Serde Library |org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe | > | > |InputFormat |org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat | > | > |OutputFormat > |org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat| | > |Storage Properties |[serialization.format=1] | | > |Partition Provider |Catalog | | > ++--+---+ > > With datasource table ,working fine ( create table using parquet instead of > stored by ) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail:
[jira] [Commented] (SPARK-25071) BuildSide is coming not as expected with join queries
[ https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651113#comment-16651113 ] Sujith commented on SPARK-25071: cc [~ZenWzh] Please suggest as i want to take up this JIRA > BuildSide is coming not as expected with join queries > - > > Key: SPARK-25071 > URL: https://issues.apache.org/jira/browse/SPARK-25071 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 >Reporter: Ayush Anubhava >Priority: Major > > *BuildSide is not coming as expected.* > Pre-requisites: > *CBO is set as true & spark.sql.cbo.joinReorder.enabled= true.* > *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec* > *Steps:* > *Scenario 1:* > spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='800')") > spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > > *Result 1:* > scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#0L) > : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#1L) > +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight > scala> println(buildSide) > *BuildRight* > > *Scenario 2:* > spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='80')") > spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > *Result 2:* > scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#4L) > : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#5L) > +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight* > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16651097#comment-16651097 ] Sujith commented on SPARK-22601: cc *[gatorsmile|https://github.com/gatorsmile] please assign this JIRA to me as already this PR is been merged. Thanks* > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > Fix For: 2.2.1 > > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith resolved SPARK-22601. Resolution: Fixed Fix Version/s: 2.2.1 PR already merged. > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > Fix For: 2.2.1 > > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25521) Job id showing null when Job is finished.
[ https://issues.apache.org/jira/browse/SPARK-25521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626849#comment-16626849 ] Sujith edited comment on SPARK-25521 at 9/25/18 6:39 AM: - [~Bjangir] I could see the jobcontext doesn't have jobID when the flow hits FileFormatWriter.scala in the insert flow. Moreover this issue is happening in insert flow. I will check into this issue more and raise a PR for handling the same. Thanks for reporting. was (Author: s71955): [~Bjangir] I could see the jobcontext doesn't have jobID when the flow hits FileFormatWriter.scala in the insert flow. I will check into this issue more and raise a PR for handling the same. Thanks for reporting. > Job id showing null when Job is finished. > - > > Key: SPARK-25521 > URL: https://issues.apache.org/jira/browse/SPARK-25521 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.1 >Reporter: Babulal >Priority: Minor > Attachments: image-2018-09-25-12-01-31-871.png > > > scala> spark.sql("create table x1(name string,age int) stored as parquet") > scala> spark.sql("insert into x1 select 'a',29") > check logs > 2018-08-19 12:45:36 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 > (TID 0) in 874 ms on localhost (executor > driver) (1/1) > 2018-08-19 12:45:36 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose > tasks have all completed, from pool > 2018-08-19 12:45:36 INFO DAGScheduler:54 - ResultStage 0 (sql at > :24) finished in 1.131 s > 2018-08-19 12:45:36 INFO DAGScheduler:54 - Job 0 finished: sql at > :24, took 1.233329 s > 2018-08-19 12:45:36 INFO FileFormatWriter:54 - Job > {color:#d04437}null{color} committed. > 2018-08-19 12:45:36 INFO FileFormatWriter:54 - Finished processing stats for > job null. > res4: org.apache.spark.sql.DataFrame = [] > > !image-2018-09-25-12-01-31-871.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25521) Job id showing null when Job is finished.
[ https://issues.apache.org/jira/browse/SPARK-25521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626849#comment-16626849 ] Sujith commented on SPARK-25521: [~Bjangir] I could see the jobcontext doesn't have jobID when the flow hits FileFormatWriter.scala in the insert flow. I will check into this issue more and raise a PR for handling the same. Thanks for reporting. > Job id showing null when Job is finished. > - > > Key: SPARK-25521 > URL: https://issues.apache.org/jira/browse/SPARK-25521 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.1 >Reporter: Babulal >Priority: Minor > Attachments: image-2018-09-25-12-01-31-871.png > > > scala> spark.sql("create table x1(name string,age int) stored as parquet") > scala> spark.sql("insert into x1 select 'a',29") > check logs > 2018-08-19 12:45:36 INFO TaskSetManager:54 - Finished task 0.0 in stage 0.0 > (TID 0) in 874 ms on localhost (executor > driver) (1/1) > 2018-08-19 12:45:36 INFO TaskSchedulerImpl:54 - Removed TaskSet 0.0, whose > tasks have all completed, from pool > 2018-08-19 12:45:36 INFO DAGScheduler:54 - ResultStage 0 (sql at > :24) finished in 1.131 s > 2018-08-19 12:45:36 INFO DAGScheduler:54 - Job 0 finished: sql at > :24, took 1.233329 s > 2018-08-19 12:45:36 INFO FileFormatWriter:54 - Job > {color:#d04437}null{color} committed. > 2018-08-19 12:45:36 INFO FileFormatWriter:54 - Finished processing stats for > job null. > res4: org.apache.spark.sql.DataFrame = [] > > !image-2018-09-25-12-01-31-871.png! > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-23425: --- Docs Text: Release notes: Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading data, e.g.: LOAD DATA INPATH 'hdfs://hacluster/user/ext*' LOAD DATA INPATH 'hdfs://hacluster/user/???/data' Where these characters are used literally in paths, they must be escaped with a backslash. Wildcards can be used in the folder level of a local File system in Load command from now. e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/ Now onward normal Space convention can be used in folder/file names (e.g file Name.csv), Older versions space in folder/file names has been represented using '%20'(e.g. myFile%20Name). # was: Release notes: Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading data, e.g.: LOAD DATA INPATH 'hdfs://hacluster/user/ext*' LOAD DATA INPATH 'hdfs://hacluster/user/???/data' Where these characters are used literally in paths, they must be escaped with a backslash. Wildcards can be used in the folder level of a local File system in Load command from now. e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/ > load data for hdfs file path with wild card usage is not working properly > - > > Key: SPARK-23425 > URL: https://issues.apache.org/jira/browse/SPARK-23425 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Sujith >Assignee: Sujith >Priority: Major > Labels: release-notes > Fix For: 2.4.0 > > Attachments: wildcard_issue.PNG > > > load data command for loading data from non local file paths by using wild > card strings lke * are not working > eg: > "load data inpath 'hdfs://hacluster/user/ext* into table t1" > Getting Analysis excepton while executing this query > !image-2018-02-14-23-41-39-923.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-23425: --- Docs Text: Release notes: Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading data, e.g.: LOAD DATA INPATH 'hdfs://hacluster/user/ext*' LOAD DATA INPATH 'hdfs://hacluster/user/???/data' Where these characters are used literally in paths, they must be escaped with a backslash. Wildcards can be used in the folder level of a local File system in Load command from now. e.g. LOAD DATA LOCAL INPATH 'tmp/folder*/ was: Release notes: Wildcard symbols {{*}} and {{?}} can now be used in SQL paths when loading data, e.g.: LOAD DATA PATH 'hdfs://hacluster/user/ext*' LOAD DATA PATH 'hdfs://hacluster/user/???/data' Where these characters are used literally in paths, they must be escaped with a backslash. > load data for hdfs file path with wild card usage is not working properly > - > > Key: SPARK-23425 > URL: https://issues.apache.org/jira/browse/SPARK-23425 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Sujith >Assignee: Sujith >Priority: Major > Labels: release-notes > Fix For: 2.4.0 > > Attachments: wildcard_issue.PNG > > > load data command for loading data from non local file paths by using wild > card strings lke * are not working > eg: > "load data inpath 'hdfs://hacluster/user/ext* into table t1" > Getting Analysis excepton while executing this query > !image-2018-02-14-23-41-39-923.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page
[ https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610541#comment-16610541 ] Sujith commented on SPARK-25392: Not sure whether it make sense to show this details in History Server but of-course it shall not throw an error. [~srowen] [~LI,Xiao] [~hyukjin.kwon] please let us know for any suggestions. Thanks all > [Spark Job History]Inconsistent behaviour for pool details in spark web UI > and history server page > --- > > Key: SPARK-25392 > URL: https://issues.apache.org/jira/browse/SPARK-25392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: OS: SUSE 11 > Spark Version: 2.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > Steps: > 1.Enable spark.scheduler.mode = FAIR > 2.Submitted beeline jobs > create database JH; > use JH; > create table one12( id int ); > insert into one12 values(12); > insert into one12 values(13); > Select * from one12; > 3.Click on JDBC Incompleted Application ID in Job History Page > 4. Go to Job Tab in staged Web UI page > 5. Click on run at AccessController.java:0 under Desription column > 6 . Click default under Pool Name column of Completed Stages table > URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default > 7. It throws below error > HTTP ERROR 400 > Problem accessing /history/application_1536399199015_0006/stages/pool/. > Reason: > Unknown pool: default > Powered by Jetty:// x.y.z > But under > Yarn resource page it display the summary under Fair Scheduler Pool: default > URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default > Summary > Pool Name Minimum Share Pool Weight Active Stages Running Tasks > SchedulingMode > default 0 1 0 0 FIFO -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25392) [Spark Job History]Inconsistent behaviour for pool details in spark web UI and history server page
[ https://issues.apache.org/jira/browse/SPARK-25392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610535#comment-16610535 ] Sujith commented on SPARK-25392: cc [~srowen] [~LI,Xiao] > [Spark Job History]Inconsistent behaviour for pool details in spark web UI > and history server page > --- > > Key: SPARK-25392 > URL: https://issues.apache.org/jira/browse/SPARK-25392 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: OS: SUSE 11 > Spark Version: 2.3 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > Steps: > 1.Enable spark.scheduler.mode = FAIR > 2.Submitted beeline jobs > create database JH; > use JH; > create table one12( id int ); > insert into one12 values(12); > insert into one12 values(13); > Select * from one12; > 3.Click on JDBC Incompleted Application ID in Job History Page > 4. Go to Job Tab in staged Web UI page > 5. Click on run at AccessController.java:0 under Desription column > 6 . Click default under Pool Name column of Completed Stages table > URL:http://blr123109:23020/history/application_1536399199015_0006/stages/pool/?poolname=default > 7. It throws below error > HTTP ERROR 400 > Problem accessing /history/application_1536399199015_0006/stages/pool/. > Reason: > Unknown pool: default > Powered by Jetty:// x.y.z > But under > Yarn resource page it display the summary under Fair Scheduler Pool: default > URL:https://blr123110:64323/proxy/application_1536399199015_0006/stages/pool?poolname=default > Summary > Pool Name Minimum Share Pool Weight Active Stages Running Tasks > SchedulingMode > default 0 1 0 0 FIFO -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception
[ https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603126#comment-16603126 ] Sujith commented on SPARK-25271: [~cloud_fan] [~sowen] Will this cause a compatibility problem compare to older version, If user has null record ,then he is getting an exception with the current version where as the older version of spark(2.2.1) wont throw any exception. I think the Output writers has been updated in the below PR [https://github.com/apache/spark/pull/20521] > Creating parquet table with all the column null throws exception > > > Key: SPARK-25271 > URL: https://issues.apache.org/jira/browse/SPARK-25271 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: shivusondur >Priority: Major > > {code:java} > 1)cat /data/parquet.dat > 1$abc2$pqr:3$xyz > null{code} > > {code:java} > 2)spark.sql("create table vp_reader_temp (projects map) ROW > FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' > MAP KEYS TERMINATED BY '$'") > {code} > {code:java} > 3)spark.sql(" > LOAD DATA LOCAL INPATH '/data/parquet.dat' INTO TABLE vp_reader_temp") > {code} > {code:java} > 4)spark.sql("create table vp_reader STORED AS PARQUET as select * from > vp_reader_temp") > {code} > *Result :* Throwing exception (Working fine with spark 2.2.1) > {code:java} > java.lang.RuntimeException: Parquet record is malformed: empty fields are > illegal, the field should be ommited completely instead > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:180) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125) > at > org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:406) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:283) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:281) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1438) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:286) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:211) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:210) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.parquet.io.ParquetEncodingException: empty fields are > illegal, the field should be ommited completely instead > at > org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:320) > at > org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:165) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89) > at >
[jira] [Commented] (SPARK-25271) Creating parquet table with all the column null throws exception
[ https://issues.apache.org/jira/browse/SPARK-25271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603128#comment-16603128 ] Sujith commented on SPARK-25271: cc [~hyukjin.kwon] > Creating parquet table with all the column null throws exception > > > Key: SPARK-25271 > URL: https://issues.apache.org/jira/browse/SPARK-25271 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: shivusondur >Priority: Major > > {code:java} > 1)cat /data/parquet.dat > 1$abc2$pqr:3$xyz > null{code} > > {code:java} > 2)spark.sql("create table vp_reader_temp (projects map) ROW > FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ':' > MAP KEYS TERMINATED BY '$'") > {code} > {code:java} > 3)spark.sql(" > LOAD DATA LOCAL INPATH '/data/parquet.dat' INTO TABLE vp_reader_temp") > {code} > {code:java} > 4)spark.sql("create table vp_reader STORED AS PARQUET as select * from > vp_reader_temp") > {code} > *Result :* Throwing exception (Working fine with spark 2.2.1) > {code:java} > java.lang.RuntimeException: Parquet record is malformed: empty fields are > illegal, the field should be ommited completely instead > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) > at > org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:180) > at > org.apache.parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:46) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:112) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:125) > at > org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:149) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:406) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:283) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:281) > at > org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1438) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:286) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:211) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:210) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:349) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.parquet.io.ParquetEncodingException: empty fields are > illegal, the field should be ommited completely instead > at > org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.endField(MessageColumnIO.java:320) > at > org.apache.parquet.io.RecordConsumerLoggingWrapper.endField(RecordConsumerLoggingWrapper.java:165) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89) > at > org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60) > ... 21 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail:
[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith edited comment on SPARK-25073 at 8/23/18 3:48 PM: - Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." ) } i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well,even though this issue sounds to be more kind of negative scenario Please correct me if i am missing something. was (Author: s71955): Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." ) } i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well, > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error
[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith edited comment on SPARK-25073 at 8/23/18 3:43 PM: - Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." ) } i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well, was (Author: s71955): Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." ) } i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well, > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or >
[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith edited comment on SPARK-25073 at 8/23/18 3:42 PM: - Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately." ) } i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well, was (Author: s71955): Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned ``` if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately.") } ``` i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well, > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or >
[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590396#comment-16590396 ] Sujith commented on SPARK-25073: Yes, in the executor memory validation check we are displaying the proper message considering both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb in org.apache.spark.deploy.yarn.Client class as below, where as for AM container memory allocation validation only yarn.scheduler.maximum-allocation-mb is mentioned ``` if (executorMem > maxMem) { throw new IllegalArgumentException(s"Required executor memory ($executorMemory" + s"+$executorMemoryOverhead MB) is above the max threshold ($maxMem MB) of this cluster! " + "Please check the values of 'yarn.scheduler.maximum-allocation-mb' and/or " + "'yarn.nodemanager.resource.memory-mb and increase the memory appropriately.") } ``` i think is we can mention about both yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb parameters for am memory validation as well, > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or > 'yarn.nodemanager.resource.memory-mb'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an erro
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590089#comment-16590089 ] Sujith edited comment on SPARK-25073 at 8/23/18 11:28 AM: -- cc [~hyukjin.kwon] [gatorsmile|https://github.com/gatorsmile] [~srowen] was (Author: s71955): @ [~hyukjin.kwon] [@gatorsmile|https://github.com/gatorsmile] > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or > 'yarn.nodemanager.resource.memory-mb'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590089#comment-16590089 ] Sujith commented on SPARK-25073: @ [~hyukjin.kwon] [@gatorsmile|https://github.com/gatorsmile] > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or > 'yarn.nodemanager.resource.memory-mb'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25071) BuildSide is coming not as expected with join queries
[ https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582419#comment-16582419 ] Sujith edited comment on SPARK-25071 at 8/16/18 12:11 PM: -- [~hyukjin.kwon] [~ZenWzh] [~srowen] I have a doubt based on above conversation, if we consider purely row count not rawDatasize/totalsize then there is a possibility where row size can be very large for a particular table even though the number of rows will be same or slightly high, and we are ending up broadcasting table with larger data size? was (Author: s71955): [~hyukjin.kwon] [~ZenWzh]a [~srowen] I have a doubt based on above conversation, if we consider purely row count not rawDatasize/totalsize then there is a possibility where row size can be very large for a particular table even though the number of rows will be same or slightly high, and we are ending up broadcasting table with larger data size? > BuildSide is coming not as expected with join queries > - > > Key: SPARK-25071 > URL: https://issues.apache.org/jira/browse/SPARK-25071 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 >Reporter: Ayush Anubhava >Priority: Major > > *BuildSide is not coming as expected.* > Pre-requisites: > *CBO is set as true & spark.sql.cbo.joinReorder.enabled= true.* > *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec* > *Steps:* > *Scenario 1:* > spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='800')") > spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > > *Result 1:* > scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#0L) > : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#1L) > +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight > scala> println(buildSide) > *BuildRight* > > *Scenario 2:* > spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='80')") > spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > *Result 2:* > scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#4L) > : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#5L) > +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight* > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25071) BuildSide is coming not as expected with join queries
[ https://issues.apache.org/jira/browse/SPARK-25071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16582419#comment-16582419 ] Sujith commented on SPARK-25071: [~hyukjin.kwon] [~ZenWzh]a [~srowen] I have a doubt based on above conversation, if we consider purely row count not rawDatasize/totalsize then there is a possibility where row size can be very large for a particular table even though the number of rows will be same or slightly high, and we are ending up broadcasting table with larger data size? > BuildSide is coming not as expected with join queries > - > > Key: SPARK-25071 > URL: https://issues.apache.org/jira/browse/SPARK-25071 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 > Environment: Spark 2.3.1 > Hadoop 2.7.3 >Reporter: Ayush Anubhava >Priority: Major > > *BuildSide is not coming as expected.* > Pre-requisites: > *CBO is set as true & spark.sql.cbo.joinReorder.enabled= true.* > *import org.apache.spark.sql.execution.joins.BroadcastHashJoinExec* > *Steps:* > *Scenario 1:* > spark.sql("CREATE TABLE small3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='800')") > spark.sql("CREATE TABLE big3 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > > *Result 1:* > scala> val plan = spark.sql("select * from small3 t1 join big3 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#0L|#0L], [c1#1L|#1L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#0L) > : +- HiveTableScan [c1#0L|#0L], HiveTableRelation `default`.`small3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#0L|#0L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#1L) > +- HiveTableScan [c1#1L|#1L], HiveTableRelation `default`.`big3`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#1L|#1L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = BuildRight > scala> println(buildSide) > *BuildRight* > > *Scenario 2:* > spark.sql("CREATE TABLE small4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='600','totalSize'='80')") > spark.sql("CREATE TABLE big4 (c1 bigint) TBLPROPERTIES ('numRows'='2', > 'rawDataSize'='6000', 'totalSize'='800')") > val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > println(buildSide) > *Result 2:* > scala> val plan = spark.sql("select * from small4 t1 join big4 t2 on (t1.c1 = > t2.c1)").queryExecution.executedPlan > plan: org.apache.spark.sql.execution.SparkPlan = > *(2) BroadcastHashJoin [c1#4L|#4L], [c1#5L|#5L], Inner, BuildRight > :- *(2) Filter isnotnull(c1#4L) > : +- HiveTableScan [c1#4L|#4L], HiveTableRelation `default`.`small4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#4L|#4L] > +- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, > false])) > +- *(1) Filter isnotnull(c1#5L) > +- HiveTableScan [c1#5L|#5L], HiveTableRelation `default`.`big4`, > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#5L|#5L] > scala> val buildSide = > plan.children.head.asInstanceOf[BroadcastHashJoinExec].buildSide > buildSide: org.apache.spark.sql.execution.joins.BuildSide = *BuildRight* > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25073) Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always reports an error req
[ https://issues.apache.org/jira/browse/SPARK-25073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16574864#comment-16574864 ] Sujith commented on SPARK-25073: Seems to be you are right, Message is bit misleading to the user. As per my understanding there is also dependency in yarn.nodemanager.resource.memory-mb parameter. *_yarn.nodemanager.resource.memory-mb:_* Amount of physical memory, in MB, that can be allocated for containers. It means the amount of memory YARN can utilize on this node and therefore this property should be lower then the total memory of that machine. *_yarn.scheduler.maximum-allocation-mb_* It defines the maximum memory allocation available for a container in MB, it means RM can only allocate memory to containers in increments of {{"yarn.scheduler.minimum-allocation-mb"}} and not exceed {{"yarn.scheduler.maximum-allocation-mb"}} and It should not be more then total allocated memory of the Node. I will try to analyze more on this and i will raise PR if it requires a fix. Thanks. > Spark-submit on Yarn Task : When the yarn.nodemanager.resource.memory-mb > and/or yarn.scheduler.maximum-allocation-mb is insufficient, Spark always > reports an error request to adjust yarn.scheduler.maximum-allocation-mb > -- > > Key: SPARK-25073 > URL: https://issues.apache.org/jira/browse/SPARK-25073 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.0, 2.3.1 >Reporter: vivek kumar >Priority: Minor > > When the yarn.nodemanager.resource.memory-mb and/or > yarn.scheduler.maximum-allocation-mb is insufficient, Spark *always* reports > an error request to adjust Yarn.scheduler.maximum-allocation-mb. Expecting > the error request to be more around yarn.scheduler.maximum-allocation-mb' > and/or 'yarn.nodemanager.resource.memory-mb'. > > Scenario 1. yarn.scheduler.maximum-allocation-mb =4g and > yarn.nodemanager.resource.memory-mb =8G > # Launch shell on Yarn with am.memory less than nodemanager.resource memory > but greater than yarn.scheduler.maximum-allocation-mb > eg; spark-shell --master yarn --conf spark.yarn.am.memory 5g > Error: java.lang.IllegalArgumentException: Required AM memory (5120+512 MB) > is above the max threshold (4096 MB) of this cluster! Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'. > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Scenario 2*. yarn.scheduler.maximum-allocation-mb =15g and > yarn.nodemanager.resource.memory-mb =8g > a. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory but less than yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=10g* > Error : > java.lang.IllegalArgumentException: Required AM memory (10240+1024 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > b. Launch shell on Yarn with am.memory greater than nodemanager.resource > memory and yarn.scheduler.maximum-allocation-mb > eg; *spark-shell --master yarn --conf spark.yarn.am.memory=17g* > Error: > java.lang.IllegalArgumentException: Required AM memory (17408+1740 MB) is > above the max threshold (*8096 MB*) of this cluster! *Please increase the > value of 'yarn.scheduler.maximum-allocation-mb'.* > at > org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:325) > > *Expected* : Error request for scenario2 should be more around > yarn.scheduler.maximum-allocation-mb' and/or > 'yarn.nodemanager.resource.memory-mb'. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24812) Last Access Time in the table description is not valid
[ https://issues.apache.org/jira/browse/SPARK-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-24812: --- Description: Last Access Time in the table description is not valid, Test steps: Step 1 - create a table Step 2 - Run command "DESC FORMATTED table" Last Access Time will always displayed wrong date Wed Dec 31 15:59:59 PST 1969 - which is wrong. !image-2018-07-16-15-37-28-896.png! In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. Please find the snapshot tested in hive for the same com !image-2018-07-16-15-38-26-717.png! mand Seems to be a limitation as of now, better we can follow the hive behavior in this scenario. was: Last Access Time in the table description is not valid, Test steps: Step 1 - create a table Step 2 - Run command "DESC FORMATTED table" Last Access Time will always displayed wrong date Wed Dec 31 15:59:59 PST 1969 - which is wrong. In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. Seems to be a limitation as of now, better we can follow the hive behavior in this scenario. > Last Access Time in the table description is not valid > -- > > Key: SPARK-24812 > URL: https://issues.apache.org/jira/browse/SPARK-24812 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.1 >Reporter: Sujith >Priority: Minor > Attachments: image-2018-07-16-15-37-28-896.png, > image-2018-07-16-15-38-26-717.png > > > Last Access Time in the table description is not valid, > Test steps: > Step 1 - create a table > Step 2 - Run command "DESC FORMATTED table" > Last Access Time will always displayed wrong date > Wed Dec 31 15:59:59 PST 1969 - which is wrong. > !image-2018-07-16-15-37-28-896.png! > In hive its displayed as "UNKNOWN" which makes more sense than displaying > wrong date. > Please find the snapshot tested in hive for the same com > !image-2018-07-16-15-38-26-717.png! mand > > Seems to be a limitation as of now, better we can follow the hive behavior in > this scenario. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24812) Last Access Time in the table description is not valid
[ https://issues.apache.org/jira/browse/SPARK-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-24812: --- Attachment: image-2018-07-16-15-38-26-717.png > Last Access Time in the table description is not valid > -- > > Key: SPARK-24812 > URL: https://issues.apache.org/jira/browse/SPARK-24812 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.1 >Reporter: Sujith >Priority: Minor > Attachments: image-2018-07-16-15-37-28-896.png, > image-2018-07-16-15-38-26-717.png > > > Last Access Time in the table description is not valid, > Test steps: > Step 1 - create a table > Step 2 - Run command "DESC FORMATTED table" > Last Access Time will always displayed wrong date > Wed Dec 31 15:59:59 PST 1969 - which is wrong. > In hive its displayed as "UNKNOWN" which makes more sense than displaying > wrong date. > Seems to be a limitation as of now, better we can follow the hive behavior in > this scenario. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-24812) Last Access Time in the table description is not valid
[ https://issues.apache.org/jira/browse/SPARK-24812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-24812: --- Attachment: image-2018-07-16-15-37-28-896.png > Last Access Time in the table description is not valid > -- > > Key: SPARK-24812 > URL: https://issues.apache.org/jira/browse/SPARK-24812 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.1 >Reporter: Sujith >Priority: Minor > Attachments: image-2018-07-16-15-37-28-896.png > > > Last Access Time in the table description is not valid, > Test steps: > Step 1 - create a table > Step 2 - Run command "DESC FORMATTED table" > Last Access Time will always displayed wrong date > Wed Dec 31 15:59:59 PST 1969 - which is wrong. > In hive its displayed as "UNKNOWN" which makes more sense than displaying > wrong date. > Seems to be a limitation as of now, better we can follow the hive behavior in > this scenario. > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-24812) Last Access Time in the table description is not valid
Sujith created SPARK-24812: -- Summary: Last Access Time in the table description is not valid Key: SPARK-24812 URL: https://issues.apache.org/jira/browse/SPARK-24812 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1, 2.2.1 Reporter: Sujith Last Access Time in the table description is not valid, Test steps: Step 1 - create a table Step 2 - Run command "DESC FORMATTED table" Last Access Time will always displayed wrong date Wed Dec 31 15:59:59 PST 1969 - which is wrong. In hive its displayed as "UNKNOWN" which makes more sense than displaying wrong date. Seems to be a limitation as of now, better we can follow the hive behavior in this scenario. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364538#comment-16364538 ] Sujith commented on SPARK-23425: I am working towards resolving this bug, please let me know for any suggestions or valuable feedback > load data for hdfs file path with wild card usage is not working properly > - > > Key: SPARK-23425 > URL: https://issues.apache.org/jira/browse/SPARK-23425 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Sujith >Priority: Major > Attachments: wildcard_issue.PNG > > > load data command for loading data from non local file paths by using wild > card strings lke * are not working > eg: > "load data inpath 'hdfs://hacluster/user/ext* into table t1" > Getting Analysis excepton while executing this query > !image-2018-02-14-23-41-39-923.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly
[ https://issues.apache.org/jira/browse/SPARK-23425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-23425: --- Attachment: wildcard_issue.PNG > load data for hdfs file path with wild card usage is not working properly > - > > Key: SPARK-23425 > URL: https://issues.apache.org/jira/browse/SPARK-23425 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.1, 2.3.0 >Reporter: Sujith >Priority: Major > Attachments: wildcard_issue.PNG > > > load data command for loading data from non local file paths by using wild > card strings lke * are not working > eg: > "load data inpath 'hdfs://hacluster/user/ext* into table t1" > Getting Analysis excepton while executing this query > !image-2018-02-14-23-41-39-923.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-23425) load data for hdfs file path with wild card usage is not working properly
Sujith created SPARK-23425: -- Summary: load data for hdfs file path with wild card usage is not working properly Key: SPARK-23425 URL: https://issues.apache.org/jira/browse/SPARK-23425 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.1, 2.3.0 Reporter: Sujith load data command for loading data from non local file paths by using wild card strings lke * are not working eg: "load data inpath 'hdfs://hacluster/user/ext* into table t1" Getting Analysis excepton while executing this query !image-2018-02-14-23-41-39-923.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265748#comment-16265748 ] Sujith commented on SPARK-22601: sure Sean. > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-22601: --- Issue Type: Bug (was: Improvement) > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265264#comment-16265264 ] Sujith edited comment on SPARK-22601 at 11/24/17 1:32 PM: -- I think in this scenario we should validate the hdfs path to check whether its existing , as i noticed we are validating and throwing exception in case of local non existing file path. if we wont validate then then this can mislead the user and also this creates an inconsistency in the load command behaviour with local and hdfs path. i am working on this issue, will raise PR asap. was (Author: s71955): I think in this scenario we should validate the hdfs path to check whether its existing , as i noticed we are validating and throwing exception in case of local non existing file path. i am working on this issue, will raise PR asap. > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265264#comment-16265264 ] Sujith edited comment on SPARK-22601 at 11/24/17 1:14 PM: -- I think in this scenario we should validate the hdfs path to check whether its existing , as i noticed we are validating and throwing exception in case of local non existing file path. i am working on this issue, will raise PR asap. was (Author: s71955): I am working on this issue. > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
[ https://issues.apache.org/jira/browse/SPARK-22601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16265264#comment-16265264 ] Sujith commented on SPARK-22601: I am working on this issue. > Data load is getting displayed successful on providing non existing hdfs file > path > -- > > Key: SPARK-22601 > URL: https://issues.apache.org/jira/browse/SPARK-22601 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: Sujith >Priority: Minor > > Data load is getting displayed successful on providing non existing hdfs file > path where as in local path proper error message is getting displayed > create table tb2 (a string, b int); > load data inpath 'hdfs://hacluster/data1.csv' into table tb2 > Note: data1.csv does not exist in HDFS > when local non existing file path is given below error message will be > displayed > "LOAD DATA input path does not exist". attached snapshots of behaviour in > spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-22601) Data load is getting displayed successful on providing non existing hdfs file path
Sujith created SPARK-22601: -- Summary: Data load is getting displayed successful on providing non existing hdfs file path Key: SPARK-22601 URL: https://issues.apache.org/jira/browse/SPARK-22601 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Sujith Priority: Minor Data load is getting displayed successful on providing non existing hdfs file path where as in local path proper error message is getting displayed create table tb2 (a string, b int); load data inpath 'hdfs://hacluster/data1.csv' into table tb2 Note: data1.csv does not exist in HDFS when local non existing file path is given below error message will be displayed "LOAD DATA input path does not exist". attached snapshots of behaviour in spark 2.1 and spark 2.2 version -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20380) describe table not showing updated table comment after alter operation
[ https://issues.apache.org/jira/browse/SPARK-20380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-20380: --- Description: When user alter the table properties and adds/updates table comment. table comment which is now directly part of CatalogTable instance is not getting updated and old table comment was shown Proposal for solution To handle this issue while updating the table properties map with newly added/modified properties in CatalogTable instance also update the comment parameter in CatalogTable with the newly added/modified comment. already a PR is added for this issue https://github.com/apache/spark/pull/17649 was: When user alter the table properties and adds/updates table comment. table comment which is now directly part of CatalogTable instance is not getting updated and old table comment was shown Proposal for solution To handle this issue while updating the table properties map with newly added/modified properties in CatalogTable instance also update the comment parameter in CatalogTable with the newly added/modified comment. > describe table not showing updated table comment after alter operation > -- > > Key: SPARK-20380 > URL: https://issues.apache.org/jira/browse/SPARK-20380 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Sujith > > When user alter the table properties and adds/updates table comment. table > comment which is now directly part of CatalogTable instance is not getting > updated and old table comment was shown > Proposal for solution > To handle this issue while updating the table properties map with newly > added/modified properties in CatalogTable > instance also update the comment parameter in CatalogTable with the newly > added/modified comment. already a PR is added for this issue > https://github.com/apache/spark/pull/17649 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20380) describe table not showing updated table comment after alter operation
Sujith created SPARK-20380: -- Summary: describe table not showing updated table comment after alter operation Key: SPARK-20380 URL: https://issues.apache.org/jira/browse/SPARK-20380 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Sujith When user alter the table properties and adds/updates table comment. table comment which is now directly part of CatalogTable instance is not getting updated and old table comment was shown Proposal for solution To handle this issue while updating the table properties map with newly added/modified properties in CatalogTable instance also update the comment parameter in CatalogTable with the newly added/modified comment. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20023) Can not see table comment when describe formatted table
[ https://issues.apache.org/jira/browse/SPARK-20023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15970365#comment-15970365 ] Sujith commented on SPARK-20023: @chenerlu, your point is right, after executing the alter command with newly added/modified table comment, the same is not reflecting when we execute desc formatted table query. table comment which is now directly part of CatalogTable instance is not getting updated and old table comment was shown, to handle this issue while updating the table properties map with newly added/modified properties in CatalogTable instance also update the comment parameter in CatalogTable with the newly added/modified comment. I raised PR after fixing this issue, https://github.com/apache/spark/pull/17649 Please let me know for any suggestions. > Can not see table comment when describe formatted table > --- > > Key: SPARK-20023 > URL: https://issues.apache.org/jira/browse/SPARK-20023 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: chenerlu >Assignee: Xiao Li > Fix For: 2.2.0 > > > Spark 2.x implements create table by itself. > https://github.com/apache/spark/commit/7d2ed8cc030f3d84fea47fded072c320c3d87ca7 > But in the implement mentioned above, it remove table comment from > properties, so user can not see table comment through run "describe formatted > table". Similarly, when user alters table comment, he still can not see the > change of table comment through run "describe formatted table". > I wonder why we removed table comments, is this a bug? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19222) Limit Query Performance issue
[ https://issues.apache.org/jira/browse/SPARK-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-19222: --- Description: When limit is being added in the middle of the physical plan there will be possibility of memory bottleneck if the limit value is too large and system will try to aggregate all the partition limit values as part of single partition. Description: Eg: create table src_temp as select * from src limit n;(n=1000) == Physical Plan == ExecutedCommand +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, InsertIntoHiveTable] +- GlobalLimit 1000 +- LocalLimit 1000 +- Project [imei#101, age#102, task#103L, num#104, level#105, productdate#106, name#107, point#108] +- SubqueryAlias hive +- Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108] csv | As shown in above plan when the limit comes in middle,there can be two types of performance bottlenecks. scenario 1: when the partition count is very high and limit value is small scenario 2: when the limit value is very large Eg,current scenario based on following sample data of limit count is 1000 and partition count 5 Local Limit > |partition 1||partition 2||partition 3||partition 4||partition 5| --> <><><><><> | Shuffle Exchange(into single partition) | Global Limit ><< take n>> (all the partition data will be grouped in single partition) as the above scenario occurs where system will shuffle and try to group the limit data from all partition to single partition which will induce performance bottleneck. was: When limit is being added in the middle of the physical plan there will be possibility of memory bottleneck if the limit value is too large and system will try to aggregate all the partition limit values as part of single partition. Description: Eg: create table src_temp as select * from src limit n;(n=1000) == Physical Plan == ExecutedCommand +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, InsertIntoHiveTable] +- GlobalLimit 1000 +- LocalLimit 1000 +- Project [imei#101, age#102, task#103L, num#104, level#105, productdate#106, name#107, point#108] +- SubqueryAlias hive +- Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108] csv | As shown in above plan when the limit comes in middle,there can be two types of performance bottlenecks. scenario 1: when the partition count is very high and limit value is small scenario 2: when the limit value is very large Eg,current scenario based on following sample data of limit count is 1000 and partition count 5 Local Limit > |partition 1| |partition 2| |partition 3| |partition 4| |partition 5| take n take n take n take n take n Shuffle Exchange(single partition) Global Limit > take n (all the partition data will be grouped in single partition) as the above scenario occurs where system will shuffle and try to group the limit data from all partition to single partition which will induce performance bottleneck. > Limit Query Performance issue > - > > Key: SPARK-19222 > URL: https://issues.apache.org/jira/browse/SPARK-19222 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Linux/Windows >Reporter: Sujith >Priority: Minor > > When limit is being added in the middle of the physical plan there will > be possibility of memory bottleneck > if the limit value is too large and system will try to aggregate all the > partition limit values as part of single partition. > Description: > Eg: > create table src_temp as select * from src limit n;(n=1000) > == Physical Plan == > ExecutedCommand >+- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, > InsertIntoHiveTable] > +- GlobalLimit 1000 > +- LocalLimit 1000 >+- Project [imei#101, age#102, task#103L, num#104, level#105, > productdate#106, name#107, point#108] > +- SubqueryAlias hive >
[jira] [Updated] (SPARK-19222) Limit Query Performance issue
[ https://issues.apache.org/jira/browse/SPARK-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujith updated SPARK-19222: --- Description: When limit is being added in the middle of the physical plan there will be possibility of memory bottleneck if the limit value is too large and system will try to aggregate all the partition limit values as part of single partition. Description: Eg: create table src_temp as select * from src limit n;(n=1000) == Physical Plan == ExecutedCommand +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, InsertIntoHiveTable] +- GlobalLimit 1000 +- LocalLimit 1000 +- Project [imei#101, age#102, task#103L, num#104, level#105, productdate#106, name#107, point#108] +- SubqueryAlias hive +- Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108] csv | As shown in above plan when the limit comes in middle,there can be two types of performance bottlenecks. scenario 1: when the partition count is very high and limit value is small scenario 2: when the limit value is very large Eg,current scenario based on following sample data of limit count is 1000 and partition count 5 Local Limit > |partition 1| |partition 2| |partition 3| |partition 4| |partition 5| take n take n take n take n take n Shuffle Exchange(single partition) Global Limit > take n (all the partition data will be grouped in single partition) as the above scenario occurs where system will shuffle and try to group the limit data from all partition to single partition which will induce performance bottleneck. was: When limit is being added in the middle of the physical plan there will be possibility of memory bottleneck if the limit value is too large and system will try to aggregate all the partition limit values as part of single partition. Description: Eg: create table src_temp as select * from src limit n;(n=1000) == Physical Plan == ExecutedCommand +- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, InsertIntoHiveTable] +- GlobalLimit 1000 +- LocalLimit 1000 +- Project [imei#101, age#102, task#103L, num#104, level#105, productdate#106, name#107, point#108] +- SubqueryAlias hive +- Relation[imei#101,age#102,task#103L,num#104,level#105,productdate#106,name#107,point#108] csv | As shown in above plan when the limit comes in middle,there can be two types of performance bottlenecks. scenario 1: when the partition count is very high and limit value is small scenario 2: when the limit value is very large Eg,current scenario based on following sample data of limit count is 1000 and partition count 5 Local Limit > |partition 1| |partition 2| |partition 3| |partition 4| |partition 5| take n take n take n take n take n Shuffle Exchange(single partition) Global Limit > take n (all the partition data will be grouped in single partition) as the above scenario occurs where system will shuffle and try to group the limit data from all partition to single partition which will induce performance bottleneck. > Limit Query Performance issue > - > > Key: SPARK-19222 > URL: https://issues.apache.org/jira/browse/SPARK-19222 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.0 > Environment: Linux/Windows >Reporter: Sujith >Priority: Minor > > When limit is being added in the middle of the physical plan there will > be possibility of memory bottleneck > if the limit value is too large and system will try to aggregate all the > partition limit values as part of single partition. > Description: > Eg: > create table src_temp as select * from src limit n;(n=1000) > == Physical Plan == > ExecutedCommand >+- CreateHiveTableAsSelectCommand [Database:spark}, TableName: t2, > InsertIntoHiveTable] > +- GlobalLimit 1000 > +- LocalLimit 1000 >+- Project [imei#101, age#102, task#103L, num#104, level#105, > productdate#106, name#107, point#108] > +- SubqueryAlias hive > +-