[jira] [Issue Comment Deleted] (SPARK-29505) desc extended is case sensitive

2019-12-16 Thread Shivu Sondur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-29505:
-
Comment: was deleted

(was: I am checking this issue)

> desc extended   is case sensitive
> --
>
> Key: SPARK-29505
> URL: https://issues.apache.org/jira/browse/SPARK-29505
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> {code}
> create table customer(id int, name String, *CName String*, address String, 
> city String, pin int, country String);
> insert into customer values(1,'Alfred','Maria','Obere Str 
> 57','Berlin',12209,'Germany');
> insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
> D.F.',05021,'Maxico');
> insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
> 2312','Maxico D.F.',05023,'Maxico');
> analyze table customer compute statistics for columns cname; – *Success( 
> Though cname is not as CName)*
> desc extended customer cname; – Failed
> jdbc:hive2://10.18.19.208:23040/default> desc extended customer *cname;*
> +-+-+
> | info_name | info_value |
> +-+-+
> | col_name | cname |
> | data_type | string |
> | comment | NULL |
> | min | NULL |
> | max | NULL |
> | num_nulls | NULL |
> | distinct_count | NULL |
> | avg_col_len | NULL |
> | max_col_len | NULL |
> | histogram | NULL |
> +-+--
> {code}
>  
> But 
> {code}
> desc extended customer CName; – SUCCESS
> 0: jdbc:hive2://10.18.19.208:23040/default> desc extended customer *CName;*
> +-+-+
> | info_name | info_value |
> +-+-+
> | col_name | CName |
> | data_type | string |
> | comment | NULL |
> | min | NULL |
> | max | NULL |
> | num_nulls | 0 |
> | distinct_count | 3 |
> | avg_col_len | 9 |
> | max_col_len | 14 |
> | histogram | NULL |
> +-+-+
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30174) Eliminate warnings :part 4

2019-12-08 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991145#comment-16991145
 ] 

Shivu Sondur commented on SPARK-30174:
--

i am working on this

> Eliminate warnings :part 4
> --
>
> Key: SPARK-30174
> URL: https://issues.apache.org/jira/browse/SPARK-30174
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jobit mathew
>Priority: Minor
>
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29595) Insertion with named_struct should match by name

2019-10-24 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959442#comment-16959442
 ] 

Shivu Sondur commented on SPARK-29595:
--

i am checking this issue

> Insertion with named_struct should match by name
> 
>
> Key: SPARK-29595
> URL: https://issues.apache.org/jira/browse/SPARK-29595
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Major
>
> {code:java}
> spark-sql> create table str using parquet as(select named_struct('a', 1, 'b', 
> 2) as data);
> spark-sql>  insert into str values named_struct("b", 3, "a", 1);
> spark-sql> select * from str;
> {"a":3,"b":1}
> {"a":1,"b":2}
> {code}
> The result should be 
> {code:java}
> {"a":1,"b":3}
> {"a":1,"b":2}
> {code}
> Spark should match the field names of named_struct on insertion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Shivu Sondur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-29596:
-
Comment: was deleted

(was: i am checking this issue)

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959438#comment-16959438
 ] 

Shivu Sondur commented on SPARK-29596:
--

i am checking this issue

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29505) desc extended is case sensitive

2019-10-18 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954330#comment-16954330
 ] 

Shivu Sondur commented on SPARK-29505:
--

I am checking this issue

> desc extended   is case sensitive
> --
>
> Key: SPARK-29505
> URL: https://issues.apache.org/jira/browse/SPARK-29505
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> create table customer(id int, name String, *CName String*, address String, 
> city String, pin int, country String);
> insert into customer values(1,'Alfred','Maria','Obere Str 
> 57','Berlin',12209,'Germany');
> insert into customer values(2,'Ana','trujilo','Adva de la','Maxico 
> D.F.',05021,'Maxico');
> insert into customer values(3,'Antonio','Antonio Moreno','Mataderos 
> 2312','Maxico D.F.',05023,'Maxico');
> analyze table customer compute statistics for columns cname; – *Success( 
> Though cname is not as CName)*
> desc extended customer cname; – Failed
> jdbc:hive2://10.18.19.208:23040/default> desc extended customer *cname;*
> +-+-+
> | info_name | info_value |
> +-+-+
> | col_name | cname |
> | data_type | string |
> | comment | NULL |
> | min | NULL |
> | max | NULL |
> | num_nulls | NULL |
> | distinct_count | NULL |
> | avg_col_len | NULL |
> | max_col_len | NULL |
> | histogram | NULL |
> +-+--
>  
> But 
> desc extended customer CName; – SUCCESS
> 0: jdbc:hive2://10.18.19.208:23040/default> desc extended customer *CName;*
> +-+-+
> | info_name | info_value |
> +-+-+
> | col_name | CName |
> | data_type | string |
> | comment | NULL |
> | min | NULL |
> | max | NULL |
> | num_nulls | 0 |
> | distinct_count | 3 |
> | avg_col_len | 9 |
> | max_col_len | 14 |
> | histogram | NULL |
> +-+-+
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28995) Document working of Spark Streaming

2019-09-05 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923356#comment-16923356
 ] 

Shivu Sondur commented on SPARK-28995:
--

i will check this issue

> Document working of Spark Streaming
> ---
>
> Key: SPARK-28995
> URL: https://issues.apache.org/jira/browse/SPARK-28995
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28993) Document Working of Bucketing

2019-09-05 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1692#comment-1692
 ] 

Shivu Sondur commented on SPARK-28993:
--

i will be working on this

 

> Document Working of Bucketing
> -
>
> Key: SPARK-28993
> URL: https://issues.apache.org/jira/browse/SPARK-28993
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28994) Document working of Adaptive

2019-09-05 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923334#comment-16923334
 ] 

Shivu Sondur commented on SPARK-28994:
--

i will be working on this

> Document working of Adaptive
> 
>
> Key: SPARK-28994
> URL: https://issues.apache.org/jira/browse/SPARK-28994
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28990) SparkSQL invalid call to toAttribute on unresolved object, tree: *

2019-09-05 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923231#comment-16923231
 ] 

Shivu Sondur commented on SPARK-28990:
--

i will check this issue

> SparkSQL invalid call to toAttribute on unresolved object, tree: *
> --
>
> Key: SPARK-28990
> URL: https://issues.apache.org/jira/browse/SPARK-28990
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: Any
>Reporter: fengchaoge
>Priority: Major
> Fix For: 2.4.4
>
>
> h6. SparkSQL create table as select from one table which may not exists throw 
> exceptions like "*org.apache.spark.sql.catalyst.analysis.UnresolvedException: 
> Invalid call to toAttribute on unresolved object, tree: **" ,this is not 
> friendly,spark user may have no idea about what's wrong.
> h6. Simple sql can reproduce it,like this:
> ^create table default.spark as select * from default.dual;^
> ~spark-sql (default)> create table default.spark as select * from 
> default.dual;~
>  ~2019-09-05 16:27:24,127 INFO (main) [Logging.scala:logInfo(54)] - Parsing 
> command: create table default.spark as select * from default.dual~
>  ~2019-09-05 16:27:24,772 ERROR (main) [Logging.scala:logError(91)] - Failed 
> in [create table default.spark as select * from default.dual]~
>  ~org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> toAttribute on unresolved object, tree: *~
>  ~at 
> org.apache.spark.sql.catalyst.analysis.Star.toAttribute(unresolved.scala:245)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$output$1.apply(basicLogicalOperators.scala:52)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.Project$$anonfun$output$1.apply(basicLogicalOperators.scala:52)~
>  ~at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)~
>  ~at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)~
>  ~at scala.collection.immutable.List.foreach(List.scala:392)~
>  ~at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)~
>  ~at scala.collection.immutable.List.map(List.scala:296)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.Project.output(basicLogicalOperators.scala:52)~
>  ~at 
> org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:160)~
>  ~at 
> org.apache.spark.sql.hive.HiveAnalysis$$anonfun$apply$3.applyOrElse(HiveStrategies.scala:148)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1$$anonfun$2.apply(AnalysisHelper.scala:108)~
>  ~at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:107)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$$anonfun$resolveOperatorsDown$1.apply(AnalysisHelper.scala:106)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:194)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperatorsDown(AnalysisHelper.scala:106)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDown(LogicalPlan.scala:29)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.resolveOperators(AnalysisHelper.scala:73)~
>  ~at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:29)~
>  ~at org.apache.spark.sql.hive.HiveAnalysis$.apply(HiveStrategies.scala:148)~
>  ~at org.apache.spark.sql.hive.HiveAnalysis$.apply(HiveStrategies.scala:147)~
>  ~at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)~
>  ~at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)~
>  ~at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)~
>  ~at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)~
>  ~at scala.collection.mutable.ArrayBuffer.foldLeft(ArrayBuffer.scala:48)~
>  ~at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)~
>  ~at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)~
>  ~at scala.collection.immutable.List.foreach(List.scala:392)~
>  ~at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)~
>  ~at 
> 

[jira] [Commented] (SPARK-28942) Spark in local mode hostname display localhost in the Host Column of Task Summary Page

2019-09-01 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920427#comment-16920427
 ] 

Shivu Sondur commented on SPARK-28942:
--

i will work on this issue

> Spark in local mode hostname display localhost in the Host Column of Task 
> Summary Page
> --
>
> Key: SPARK-28942
> URL: https://issues.apache.org/jira/browse/SPARK-28942
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> In the stage page under Task Summary Page Host Column shows 'localhost' 
> instead of showing host IP or host name mentioned against the Driver Host Name
> Steps:
> spark-shell --master local
> create table emp(id int);
> insert into emp values(100);
> select * from emp;
> Go to  Stage UI page and check the Task Summary Page.
> Host column will display 'localhost' instead the driver host.
>  
> Note in case of spark-shell --master yarn mode UI display correct host name 
> under the column.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28809) Document SHOW TABLE in SQL Reference

2019-08-29 Thread Shivu Sondur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-28809:
-
Comment: was deleted

(was: [~dkbiswal]

i am not able to find the "SHOW TABLE" command in spark

"SHOW TABLES " command is present,

please recheck, whether this Jira is valid?)

> Document SHOW TABLE in SQL Reference
> 
>
> Key: SPARK-28809
> URL: https://issues.apache.org/jira/browse/SPARK-28809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28918) from_utc_timestamp function is mistakenly considering DST for Brazil in 2019

2019-08-29 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919045#comment-16919045
 ] 

Shivu Sondur commented on SPARK-28918:
--

[~hissashirocha]

Brazil DST removed on Sunday, 17 February 2019.

You need to update your jvm TZupdater. Refer following link link 
https://www.oracle.com/technetwork/java/javase/timezones-137583.html#tzu

> from_utc_timestamp function is mistakenly considering DST for Brazil in 2019
> 
>
> Key: SPARK-28918
> URL: https://issues.apache.org/jira/browse/SPARK-28918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: I'm using Spark through Databricks
>Reporter: Luiz Hissashi da Rocha
>Priority: Minor
>
> I realized that *from_utc_timestamp* function is assuming that Brazil will 
> have DST in 2019 but it will not, unlike previous years. Because of that, 
> when I run the function bellow, instead of having "2019-11-14" (São Paulo is 
> UTC-3h), I still get "2019-11-15T00:18:01" wrongly (as if it was UTC-2h due 
> to DST).
> {code:java}
> // from_utc_timestamp("2019-11-15T02:18:01.000+", 'America/Sao_Paulo')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28906) `bin/spark-submit --version` shows incorrect info

2019-08-28 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918191#comment-16918191
 ] 

Shivu Sondur commented on SPARK-28906:
--

[~vanzin]

I verified in the master branch,  i got below results

Please let me know what information is wrong  in the below snap?

!image-2019-08-29-05-50-13-526.png!

 

> `bin/spark-submit --version` shows incorrect info
> -
>
> Key: SPARK-28906
> URL: https://issues.apache.org/jira/browse/SPARK-28906
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.4, 2.4.0, 2.4.1, 2.4.2, 
> 3.0.0, 2.4.3
>Reporter: Marcelo Vanzin
>Priority: Minor
> Attachments: image-2019-08-29-05-50-13-526.png
>
>
> Since Spark 2.3.1, `spark-submit` shows a wrong information.
> {code}
> $ bin/spark-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.3
>   /_/
> Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222
> Branch
> Compiled by user  on 2019-02-04T13:00:46Z
> Revision
> Url
> Type --help for more information.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28906) `bin/spark-submit --version` shows incorrect info

2019-08-28 Thread Shivu Sondur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-28906:
-
Attachment: image-2019-08-29-05-50-13-526.png

> `bin/spark-submit --version` shows incorrect info
> -
>
> Key: SPARK-28906
> URL: https://issues.apache.org/jira/browse/SPARK-28906
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.4, 2.4.0, 2.4.1, 2.4.2, 
> 3.0.0, 2.4.3
>Reporter: Marcelo Vanzin
>Priority: Minor
> Attachments: image-2019-08-29-05-50-13-526.png
>
>
> Since Spark 2.3.1, `spark-submit` shows a wrong information.
> {code}
> $ bin/spark-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.3
>   /_/
> Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222
> Branch
> Compiled by user  on 2019-02-04T13:00:46Z
> Revision
> Url
> Type --help for more information.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28906) `bin/spark-submit --version` shows incorrect info

2019-08-28 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918178#comment-16918178
 ] 

Shivu Sondur commented on SPARK-28906:
--

i am checking this issue

> `bin/spark-submit --version` shows incorrect info
> -
>
> Key: SPARK-28906
> URL: https://issues.apache.org/jira/browse/SPARK-28906
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.4, 2.4.0, 2.4.1, 2.4.2, 
> 3.0.0, 2.4.3
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> Since Spark 2.3.1, `spark-submit` shows a wrong information.
> {code}
> $ bin/spark-submit --version
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.3.3
>   /_/
> Using Scala version 2.11.8, OpenJDK 64-Bit Server VM, 1.8.0_222
> Branch
> Compiled by user  on 2019-02-04T13:00:46Z
> Revision
> Url
> Type --help for more information.
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28898) SQL Configuration should be mentioned under Spark SQL in User Guide

2019-08-28 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16917785#comment-16917785
 ] 

Shivu Sondur commented on SPARK-28898:
--

If you guys agree, i will work on this issue

> SQL Configuration should be mentioned under Spark SQL in User Guide
> ---
>
> Key: SPARK-28898
> URL: https://issues.apache.org/jira/browse/SPARK-28898
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> Now if user gives  set -v; then only End user will come to know about entire 
> list of the spark.sql.XX configuration.
> For the End user unless familiar with Spark code can not use these configure 
> SQL Configuration entire list.
> I feel this should be documented in 3.0 user guide for the entire list of SQL 
> Configuration like 
> [Spark-Streaming|https://spark.apache.org/docs/latest/configuration.html#spark-streaming]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28873) [UDF]show functions behaves different in hive and spark

2019-08-26 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915653#comment-16915653
 ] 

Shivu Sondur commented on SPARK-28873:
--

[~hyukjin.kwon] , [~dongjoon]

Is this change is required? 

as my initial analysis, It is passing the dbname to lis the Functions() in 
HiveExternalCatalog.scala.

we can add one more method to list all the function from all the databases.

what are your comments?

 

> [UDF]show functions behaves different in hive and spark
> ---
>
> Key: SPARK-28873
> URL: https://issues.apache.org/jira/browse/SPARK-28873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Description:
> Launch spark-beeline
> show functions;
> Result will list all the system level functions and permanent UDF created 
> inside deafult db.
> jdbc:hive2://10.18.18.214:23040/default> create function func.mul_651  AS 
> 'com.huawei.bigdata.hive.example.udf.multiply' using jar 
> 'hdfs://hacluster/user/Multiply.jar';
> create function default.mul_651  AS 
> 'com.huawei.bigdata.hive.example.udf.multiply' using jar 
> 'hdfs://hacluster/user/Multiply.jar';
> show functions;
> Will list give total count of functions created inside default DB only.
> In Hive
> jdbc:hive2://10.18.98.147:21066/> create function func.add_5  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF'  using jar 
> 'hdfs://hacluster/user/AddDoublesUDF.jar';
> jdbc:hive2://10.18.98.147:21066/> create function default.add_5  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF'  using jar 
> 'hdfs://hacluster/user/AddDoublesUDF.jar';
> Show functions; will list all the UDF functions created inside the Database 
> func as well as default.
> Expected: show functions; should list all the functions of user created as 
> well as systems functions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28873) [UDF]show functions behaves different in hive and spark

2019-08-26 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915593#comment-16915593
 ] 

Shivu Sondur commented on SPARK-28873:
--

i will check this issue

> [UDF]show functions behaves different in hive and spark
> ---
>
> Key: SPARK-28873
> URL: https://issues.apache.org/jira/browse/SPARK-28873
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Description:
> Launch spark-beeline
> show functions;
> Result will list all the system level functions and permanent UDF created 
> inside deafult db.
> jdbc:hive2://10.18.18.214:23040/default> create function func.mul_651  AS 
> 'com.huawei.bigdata.hive.example.udf.multiply' using jar 
> 'hdfs://hacluster/user/Multiply.jar';
> create function default.mul_651  AS 
> 'com.huawei.bigdata.hive.example.udf.multiply' using jar 
> 'hdfs://hacluster/user/Multiply.jar';
> show functions;
> Will list give total count of functions created inside default DB only.
> In Hive
> jdbc:hive2://10.18.98.147:21066/> create function func.add_5  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF'  using jar 
> 'hdfs://hacluster/user/AddDoublesUDF.jar';
> jdbc:hive2://10.18.98.147:21066/> create function default.add_5  AS 
> 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF'  using jar 
> 'hdfs://hacluster/user/AddDoublesUDF.jar';
> Show functions; will list all the UDF functions created inside the Database 
> func as well as default.
> Expected: show functions; should list all the functions of user created as 
> well as systems functions.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28809) Document SHOW TABLE in SQL Reference

2019-08-24 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914932#comment-16914932
 ] 

Shivu Sondur commented on SPARK-28809:
--

[~dkbiswal]

i am not able to find the "SHOW TABLE" command in spark

"SHOW TABLES " command is present,

please recheck, whether this Jira is valid?

> Document SHOW TABLE in SQL Reference
> 
>
> Key: SPARK-28809
> URL: https://issues.apache.org/jira/browse/SPARK-28809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28823) Document CREATE ROLE Statement

2019-08-24 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16914929#comment-16914929
 ] 

Shivu Sondur commented on SPARK-28823:
--

[~jobitmathew]

please check there in no statment like "CREATE ROLE" in spark sql

> Document CREATE ROLE Statement 
> ---
>
> Key: SPARK-28823
> URL: https://issues.apache.org/jira/browse/SPARK-28823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: jobit mathew
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28823) Document CREATE ROLE Statement

2019-08-24 Thread Shivu Sondur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-28823:
-
Comment: was deleted

(was: i will work on this)

> Document CREATE ROLE Statement 
> ---
>
> Key: SPARK-28823
> URL: https://issues.apache.org/jira/browse/SPARK-28823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: jobit mathew
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28827) Document SELECT CURRENT_DATABASE in SQL Reference

2019-08-22 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913903#comment-16913903
 ] 

Shivu Sondur commented on SPARK-28827:
--

i will work on this

> Document SELECT CURRENT_DATABASE in SQL Reference
> -
>
> Key: SPARK-28827
> URL: https://issues.apache.org/jira/browse/SPARK-28827
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28823) Document CREATE ROLE Statement

2019-08-22 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16913902#comment-16913902
 ] 

Shivu Sondur commented on SPARK-28823:
--

i will work on this

> Document CREATE ROLE Statement 
> ---
>
> Key: SPARK-28823
> URL: https://issues.apache.org/jira/browse/SPARK-28823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: jobit mathew
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28798) Document DROP TABLE/VIEW statement in SQL Reference.

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912084#comment-16912084
 ] 

Shivu Sondur commented on SPARK-28798:
--

i will check this issue

> Document DROP TABLE/VIEW statement in SQL Reference.
> 
>
> Key: SPARK-28798
> URL: https://issues.apache.org/jira/browse/SPARK-28798
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28810) Document SHOW TABLES in SQL Reference.

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912082#comment-16912082
 ] 

Shivu Sondur commented on SPARK-28810:
--

i will check this issue

> Document SHOW TABLES in SQL Reference.
> --
>
> Key: SPARK-28810
> URL: https://issues.apache.org/jira/browse/SPARK-28810
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28809) Document SHOW TABLE in SQL Reference

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912080#comment-16912080
 ] 

Shivu Sondur commented on SPARK-28809:
--

i will check this issue

> Document SHOW TABLE in SQL Reference
> 
>
> Key: SPARK-28809
> URL: https://issues.apache.org/jira/browse/SPARK-28809
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28819) Document CREATE OR REPLACE FUNCTION in SQL Reference

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912062#comment-16912062
 ] 

Shivu Sondur commented on SPARK-28819:
--

i will check this issue

> Document CREATE OR REPLACE FUNCTION in SQL Reference
> 
>
> Key: SPARK-28819
> URL: https://issues.apache.org/jira/browse/SPARK-28819
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28820) Document SHOW FUNCTION LIKE SQL Reference

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912061#comment-16912061
 ] 

Shivu Sondur commented on SPARK-28820:
--

i will check this issue

> Document SHOW FUNCTION LIKE SQL Reference
> -
>
> Key: SPARK-28820
> URL: https://issues.apache.org/jira/browse/SPARK-28820
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28821) Document COMPUTE STAT in SQL Reference

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912060#comment-16912060
 ] 

Shivu Sondur commented on SPARK-28821:
--

I will check this issue

> Document COMPUTE STAT in SQL Reference
> --
>
> Key: SPARK-28821
> URL: https://issues.apache.org/jira/browse/SPARK-28821
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28822) Document USE DATABASE in SQL Reference

2019-08-21 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16912059#comment-16912059
 ] 

Shivu Sondur commented on SPARK-28822:
--

Thanks 

i will check this issue

> Document USE DATABASE in SQL Reference
> --
>
> Key: SPARK-28822
> URL: https://issues.apache.org/jira/browse/SPARK-28822
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28761) spark.driver.maxResultSize only applies to compressed data

2019-08-18 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909943#comment-16909943
 ] 

Shivu Sondur commented on SPARK-28761:
--

This issus is duplicate of SPARK-28613

> spark.driver.maxResultSize only applies to compressed data
> --
>
> Key: SPARK-28761
> URL: https://issues.apache.org/jira/browse/SPARK-28761
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: David Vogelbacher
>Priority: Major
>
> Spark has a setting {{spark.driver.maxResultSize}}, see 
> https://spark.apache.org/docs/latest/configuration.html#application-properties
>  :
> {noformat}
> Limit of total size of serialized results of all partitions for each Spark 
> action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. 
> Jobs will be aborted if the total size is above this limit. Having a high 
> limit may cause out-of-memory errors in driver (depends on 
> spark.driver.memory and memory overhead of objects in JVM). 
> Setting a proper limit can protect the driver from out-of-memory errors.
> {noformat}
> This setting can be very useful in constraining the memory that the spark 
> driver needs for a specific spark action. However, this limit is checked 
> before decompressing data in 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L662
> Even if the compressed data is below the limit the uncompressed data can 
> still be far above. In order to protect the driver we should also impose a 
> limit on the uncompressed data. We could do this in 
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala#L344
> I propose adding a new config option 
> {{spark.driver.maxUncompressedResultSize}}.
> A simple repro of this with spark shell:
> {noformat}
> > printf 'a%.0s' {1..10} > test.csv # create a 100 MB file
> > ./bin/spark-shell --conf "spark.driver.maxResultSize=1"
> scala> val df = spark.read.format("csv").load("/Users/dvogelbacher/test.csv")
> df: org.apache.spark.sql.DataFrame = [_c0: string]
> scala> val results = df.collect()
> results: Array[org.apache.spark.sql.Row] = 
> Array([a...
> scala> results(0).getString(0).size
> res0: Int = 10
> {noformat}
> Even though we set maxResultSize to 10 MB, we collect a result that is 100MB 
> uncompressed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28636) Thriftserver can not support decimal type with negative scale

2019-08-13 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16906868#comment-16906868
 ] 

Shivu Sondur commented on SPARK-28636:
--

I checked this issue, it looks like Hive side changes required, i will further 
check and update here.

> Thriftserver can not support decimal type with negative scale
> -
>
> Key: SPARK-28636
> URL: https://issues.apache.org/jira/browse/SPARK-28636
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:sql}
> 0: jdbc:hive2://localhost:1> select 2.35E10 * 1.0;
> Error: java.lang.IllegalArgumentException: Error: name expected at the 
> position 10 of 'decimal(6,-7)' but '-' is found. (state=,code=0)
> {code}
> {code:sql}
> spark-sql> select 2.35E10 * 1.0;
> 235
> {code}
> ThriftServer log:
> {noformat}
> java.lang.RuntimeException: java.lang.IllegalArgumentException: Error: name 
> expected at the position 10 of 'decimal(6,-7)' but '-' is found.
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
>   at 
> java.security.AccessController.doPrivileged(AccessController.java:770)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>   at 
> org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
>   at com.sun.proxy.$Proxy31.getResultSetMetadata(Unknown Source)
>   at 
> org.apache.hive.service.cli.CLIService.getResultSetMetadata(CLIService.java:502)
>   at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.GetResultSetMetadata(ThriftCLIService.java:609)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1697)
>   at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$GetResultSetMetadata.getResult(TCLIService.java:1682)
>   at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
>   at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
>   at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53)
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:819)
> Caused by: java.lang.IllegalArgumentException: Error: name expected at the 
> position 10 of 'decimal(6,-7)' but '-' is found.
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:378)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.expect(TypeInfoUtils.java:355)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parseParams(TypeInfoUtils.java:403)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils$TypeInfoParser.parsePrimitiveParts(TypeInfoUtils.java:542)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoUtils.parsePrimitiveParts(TypeInfoUtils.java:557)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.createPrimitiveTypeInfo(TypeInfoFactory.java:136)
>   at 
> org.apache.hadoop.hive.serde2.typeinfo.TypeInfoFactory.getPrimitiveTypeInfo(TypeInfoFactory.java:109)
>   at 
> org.apache.hive.service.cli.TypeDescriptor.(TypeDescriptor.java:58)
>   at org.apache.hive.service.cli.TableSchema.(TableSchema.java:54)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$.getTableSchema(SparkExecuteStatementOperation.scala:313)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema$lzycompute(SparkExecuteStatementOperation.scala:69)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.resultSchema(SparkExecuteStatementOperation.scala:64)
>   at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getResultSetSchema(SparkExecuteStatementOperation.scala:157)
>   at 
> org.apache.hive.service.cli.operation.OperationManager.getOperationResultSetSchema(OperationManager.java:233)
>   at 
> org.apache.hive.service.cli.session.HiveSessionImpl.getResultSetMetadata(HiveSessionImpl.java:787)
>   at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source)

[jira] [Commented] (SPARK-28702) Display useful error message (instead of NPE) for invalid Dataset operations (e.g. calling actions inside of transformations)

2019-08-12 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905802#comment-16905802
 ] 

Shivu Sondur commented on SPARK-28702:
--

i will check this issue

> Display useful error message (instead of NPE) for invalid Dataset operations 
> (e.g. calling actions inside of transformations)
> -
>
> Key: SPARK-28702
> URL: https://issues.apache.org/jira/browse/SPARK-28702
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Josh Rosen
>Priority: Major
>
> In Spark, SparkContext and SparkSession can only be used on the driver, not 
> on executors. For example, this means that you cannot call 
> {{someDataset.collect()}} inside of a Dataset or RDD transformation.
> When Spark serializes RDDs and Datasets, references to SparkContext and 
> SparkSession are null'ed out (by being marked as {{@transient}} or via the 
> Closure Cleaner). As a result, RDD and Dataset methods which reference use 
> these driver-side-only objects (e.g. actions or transformations) will see 
> {{null}} references and may fail with a {{NullPointerException}}. For 
> example, in code which (via a chain of calls) tried to {{collect()}} a 
> dataset inside of a Dataset.map operation:
> {code:java}Caused by: java.lang.NullPointerException
> at 
> $apache$spark$sql$Dataset$$rddQueryExecution$lzycompute(Dataset.scala:3027)
> at 
> $apache$spark$sql$Dataset$$rddQueryExecution(Dataset.scala:3025)
> at org.apache.spark.sql.Dataset.rdd$lzycompute(Dataset.scala:3038)
> at org.apache.spark.sql.Dataset.rdd(Dataset.scala:3036)
> [...] {code}
> The resulting NPE can be _very_ confusing to users.
> In SPARK-5063 I added some logic to throw clearer error messages when 
> performing similar invalid actions on RDDs. This ticket's scope is to 
> implement similar logic for Datasets.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28680) redundant code, or possible bug in Partitioner that could mess up check against spark.default.parallelism

2019-08-11 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16904599#comment-16904599
 ] 

Shivu Sondur commented on SPARK-28680:
--

[~ch...@buildlackey.com]

This is not duplicate code, it is required, 

for further details have a look below PR

https://github.com/apache/spark/pull/20002

> redundant code, or possible bug in Partitioner that could mess up check 
> against spark.default.parallelism
> -
>
> Key: SPARK-28680
> URL: https://issues.apache.org/jira/browse/SPARK-28680
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
> Environment: master = local[*]
>Reporter: Chris Bedford
>Priority: Minor
>
> This is a suggestion to reduce (what I think is) some code redundancy.
> Looking at this line of code in org.apache.spark.Partitioner:
> https://github.com/apache/spark/blob/924d794a6f5abb972fa07bf63adbb4ad544ef246/core/src/main/scala/org/apache/spark/Partitioner.scala#L83
> the first part of the && in the if condition is true if hasMaxPartitioner is 
> non empty, which means that after a scan of rdds we found one with a 
> partitioner whose # of partitions was > 0, and hasMaxPartitioner is the 
> Option wrapped RDD which has the 
> partitioner with greatest number of partitions.
> We then pass the rdd inside hasMaxPartitioner to isEligiblePartitioner where 
> we 
> set maxPartitions = the length of the longest partitioner in rdds and then 
> check
> to see if 
>  log10(maxPartitions) - log10(hasMaxPartitioner.getNumPartitions) < 1
> It seems to me that the values inside the two calls to log10 will be equal, 
> so subtracting these will result in 0, which is always < 1.
> So... isn't this whole block of code redundant ?
> It might even be a bug because the right hand side of the && condition after 
> is always 
> true, so we never check that 
>  defaultNumPartitions <= hasMaxPartitioner.get.getNumPartitions



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28613) Spark SQL action collect just judge size of compressed RDD's size, not accurate enough

2019-08-07 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902704#comment-16902704
 ] 

Shivu Sondur commented on SPARK-28613:
--

I will check

> Spark SQL action collect just judge size of compressed RDD's size, not 
> accurate enough
> --
>
> Key: SPARK-28613
> URL: https://issues.apache.org/jira/browse/SPARK-28613
> Project: Spark
>  Issue Type: Wish
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: angerszhu
>Priority: Major
>
> When we run action DataFrame.collect() , for the configuration 
> *spark.driver.maxResultSize  ,*when determine if the returned data exceeds 
> the limit, it will use the compressed byte array's size, it is not accurate. 
> Since when we get data when use SparkThriftServer, when not use incremental 
> colletion. It will get all data of datafrme for each partition.
> For return data, it has the preocess"
>  # compress data's byte array 
>  # Being packaged as ResultSet
>  # return to driver and judge by *spark.Driver.resultMaxSize*
>  # *decode(uncompress) data as Array[Row]*
> The amount of data unzipped differs significantly from the amount of data 
> unzipped, The difference in the size of the data is more than ten times
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28484) spark-submit uses wrong SPARK_HOME with deploy-mode "cluster"

2019-07-23 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891482#comment-16891482
 ] 

Shivu Sondur commented on SPARK-28484:
--

i will check this issue

> spark-submit uses wrong SPARK_HOME with deploy-mode "cluster"
> -
>
> Key: SPARK-28484
> URL: https://issues.apache.org/jira/browse/SPARK-28484
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.3
>Reporter: Kalle Jepsen
>Priority: Major
>
> When submitting an application jar to a remote Spark cluster with 
> spark-submit and deploy-mode = "cluster", the driver command that is issued 
> on one of the workers seems to be configured with the SPARK_HOME of the local 
> machine, from which spark-submit was called, not the one where the driver is 
> actually running.
>  
> I.e. if I have spark installed locally under e.g. /opt/apache-spark and 
> hadoop under /usr/lib/hadoop-3.2.0, but the cluster administrator installs 
> spark under /usr/local/spark on the workers, the command that is issued on 
> the worker still looks sth like this:
>  
> {{"/usr/lib/jvm/java/bin/java" "-cp" 
> "/opt/apache-spark/conf:/etc/hadoop:/usr/lib/hadoop-3.2.0/share/hadoop/common/lib/*:/usr/lib/hadoop-3.2.0/share/hadoop/common/*:/usr/lib/hadoop-3.2.0/share/hadoop/hdfs:/usr/lib/hadoop-3.2.0/share/hadoop/hdfs/lib/*:/usr/lib/hadoop-3.2.0/share/hadoop/hdfs/*:/usr/lib/hadoop-3.2.0/share/hadoop/mapreduce/lib/*:/usr/lib/hadoop-3.2.0/share/hadoop/mapreduce/*:/usr/lib/hadoop-3.2.0/share/hadoop/yarn:/usr/lib/hadoop-3.2.0/share/hadoop/yarn/lib/*:/usr/lib/hadoop-3.2.0/share/hadoop/yarn/*"
>  "-Xmx1024M" "-Dspark.jars=file:///some/application.jar" 
> "-Dspark.driver.supervise=false" "-Dspark.submit.deployMode=cluster" 
> "-Dspark.master=spark://:7077" "-Dspark.app.name=" 
> "-Dspark.rpc.askTimeout=10s" "org.apache.spark.deploy.worker.DriverWrapper" 
> "spark://Worker@:65000" "/some/application.jar" 
> "some.class.Name"}}
>  
> Is this expected behavior and/or can I somehow control that?
>  
> Steps to reproduce:
>  
> 1. Install Spark locally (with a SPARK_HOME that's different on the cluster)
> {{2. Run: spark-submit --deploy-mode "cluster" --master 
> "spark://spark.example.com:7077" --class "com.example.SparkApp" 
> "hdfs:/some/application.jar"}}
> 3. Observe that the application fails because some spark and/or hadoop 
> classes cannot be found
>  
> This applies to Spark Standalone, I haven't tried with YARN



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28450) When scan hive data of a not existed partition, it return an error

2019-07-23 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890710#comment-16890710
 ] 

Shivu Sondur commented on SPARK-28450:
--

[~angerszhuuu]

i want to check this issue.

can you give me all detailed steps to reproduce this issue?

 

> When scan hive data of a not existed partition, it return an error
> --
>
> Key: SPARK-28450
> URL: https://issues.apache.org/jira/browse/SPARK-28450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-07-19-20-51-12-861.png
>
>
> When we select data of a un-existed hive partition table's partition, it will 
> return error, bu it should just return empty.
> !image-2019-07-19-20-51-12-861.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890685#comment-16890685
 ] 

Shivu Sondur commented on SPARK-28480:
--

[~itsukanov]

In the latest master branch it works fine. Check below snap

!image-2019-07-23-10-58-45-768.png!

 

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
> Attachments: image-2019-07-23-10-58-45-768.png
>
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Shivu Sondur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-28480:
-
Attachment: image-2019-07-23-10-58-45-768.png

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
> Attachments: image-2019-07-23-10-58-45-768.png
>
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28480) Types of input parameters of a UDF affect the ability to cache the result

2019-07-22 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890674#comment-16890674
 ] 

Shivu Sondur commented on SPARK-28480:
--

i am checking this issue

> Types of input parameters of a UDF affect the ability to cache the result
> -
>
> Key: SPARK-28480
> URL: https://issues.apache.org/jira/browse/SPARK-28480
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
>Reporter: Ivan Tsukanov
>Priority: Major
>
> When I define a parameter in a UDF as Boolean or Int the result DataFrame 
> can't be cached 
> {code:java}
> import org.apache.spark.sql.functions.{lit, udf}
> val empty = sparkSession.emptyDataFrame
> val table = "table"
> def test(customUDF: UserDefinedFunction, col: Column): Unit = {
>   val df = empty.select(customUDF(col))
>   df.cache()
>   df.createOrReplaceTempView(table)
>   println(sparkSession.catalog.isCached(table))
> }
> test(udf { _: String => 42 }, lit("")) // true
> test(udf { _: Any => 42 }, lit("")) // true
> test(udf { _: Int => 42 }, lit(42)) // false
> test(udf { _: Boolean => 42 }, lit(false)) // false
> {code}
> or sparkSession.catalog.isCached gives irrelevant information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890084#comment-16890084
 ] 

Shivu Sondur commented on SPARK-28451:
--

[~hyukjin.kwon], [~dongjoon]

Here one more  postgres compatible issue, Is it required to handle?

Aftter checking i found following
> Spark behavior is same as *Oracle*,*mysql*
> and *MS Sql* behavior is same as *PostgreSQL*

I think we should have global settings like postgresql_Flavor, sql_Flavor 
parameter, if it is set corresponding flavor, all the function should behave 
accordingly to the database flavor.

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28451) substr returns different values

2019-07-22 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890026#comment-16890026
 ] 

Shivu Sondur commented on SPARK-28451:
--

i will check this issue

 

> substr returns different values
> ---
>
> Key: SPARK-28451
> URL: https://issues.apache.org/jira/browse/SPARK-28451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {noformat}
> postgres=# select substr('1234567890', -1, 5);
>  substr
> 
>  123
> (1 row)
> postgres=# select substr('1234567890', 1, -1);
> ERROR:  negative substring length not allowed
> {noformat}
> Spark SQL:
> {noformat}
> spark-sql> select substr('1234567890', -1, 5);
> 0
> spark-sql> select substr('1234567890', 1, -1);
> spark-sql>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28471) Formatting dates with negative years

2019-07-22 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16890024#comment-16890024
 ] 

Shivu Sondur commented on SPARK-28471:
--

[~maxgekk]

According to your discussion link,  it is not required to change any code for 
this issue right?

> Formatting dates with negative years
> 
>
> Key: SPARK-28471
> URL: https://issues.apache.org/jira/browse/SPARK-28471
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.3
>Reporter: Maxim Gekk
>Priority: Minor
>
> While converting dates with negative years to strings, Spark skips era 
> sub-field by default. That's can confuse users since years from BC era are 
> mirrored to current era. For example:
> {code}
> spark-sql> select make_date(-44, 3, 15);
> 0045-03-15
> {code}
> Even negative years are out of supported range by the DATE type, it would be 
> nice to indicate the era for such dates.
> PostgreSQL outputs the era for such inputs:
> {code}
> # select make_date(-44, 3, 15);
>make_date   
> ---
>  0044-03-15 BC
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28459) Date/Time Functions: make_timestamp

2019-07-21 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889779#comment-16889779
 ] 

Shivu Sondur commented on SPARK-28459:
--

[~maxgekk]

Oh ok, I was also analyzing it.

You can continue. Now i will stop checking this issue.

thanks for informing.

> Date/Time Functions: make_timestamp
> ---
>
> Key: SPARK-28459
> URL: https://issues.apache.org/jira/browse/SPARK-28459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28459) Date/Time Functions: make_timestamp

2019-07-20 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889556#comment-16889556
 ] 

Shivu Sondur commented on SPARK-28459:
--

I will check this.

> Date/Time Functions: make_timestamp
> ---
>
> Key: SPARK-28459
> URL: https://issues.apache.org/jira/browse/SPARK-28459
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{make_timestamp(_year_ }}{{int}}{{, _month_ }}{{int}}{{, _day_ }}{{int}}{{, 
> _hour_ }}{{int}}{{, _min_ }}{{int}}{{, _sec_}}{{double 
> precision}}{{)}}|{{timestamp}}|Create timestamp from year, month, day, hour, 
> minute and seconds fields|{{make_timestamp(2013, 7, 15, 8, 15, 
> 23.5)}}|{{2013-07-15 08:15:23.5}}|
> https://www.postgresql.org/docs/11/functions-datetime.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28390) Convert and port 'pgSQL/select_having.sql' into UDF test base

2019-07-14 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884839#comment-16884839
 ] 

Shivu Sondur commented on SPARK-28390:
--

i am working on this

> Convert and port 'pgSQL/select_having.sql' into UDF test base
> -
>
> Key: SPARK-28390
> URL: https://issues.apache.org/jira/browse/SPARK-28390
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See SPARK-28387



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28317) Built-in Mathematical Functions: SCALE

2019-07-13 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884566#comment-16884566
 ] 

Shivu Sondur commented on SPARK-28317:
--

i am working on this

> Built-in Mathematical Functions: SCALE
> --
>
> Key: SPARK-28317
> URL: https://issues.apache.org/jira/browse/SPARK-28317
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Function||Return Type||Description||Example||Result||
> |{{scale(}}{{numeric}}{{)}}|{{integer}}|scale of the argument (the number of 
> decimal digits in the fractional part)|{{scale(8.41)}}|{{2}}|
> https://www.postgresql.org/docs/11/functions-math.html#FUNCTIONS-MATH-FUNC-TABLE



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28333) NULLS FIRST for DESC and NULLS LAST for ASC

2019-07-12 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884292#comment-16884292
 ] 

Shivu Sondur commented on SPARK-28333:
--

[~yumwang]

i am working on this

> NULLS FIRST for DESC and NULLS LAST for ASC
> ---
>
> Key: SPARK-28333
> URL: https://issues.apache.org/jira/browse/SPARK-28333
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:sql}
> spark-sql> create or replace temporary view t1 as select * from (values(1), 
> (2), (null), (3), (null)) as v (val);
> spark-sql> select * from t1 order by val asc;
> NULL
> NULL
> 1
> 2
> 3
> spark-sql> select * from t1 order by val desc;
> 3
> 2
> 1
> NULL
> NULL
> {code}
> {code:sql}
> postgres=# create or replace temporary view t1 as select * from (values(1), 
> (2), (null), (3), (null)) as v (val);
> CREATE VIEW
> postgres=# select * from t1 order by val asc;
>  val
> -
>1
>2
>3
> (5 rows)
> postgres=# select * from t1 order by val desc;
>  val
> -
>3
>2
>1
> (5 rows)
> {code}
> https://www.postgresql.org/docs/11/queries-order.html



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28036) Built-in udf left/right has inconsistent behavior

2019-06-26 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873696#comment-16873696
 ] 

Shivu Sondur commented on SPARK-28036:
--

[~yumwang]

You can close this jira right?.

As it is working in spark.

Only difference is we should use offset without '-' symbol.

> Built-in udf left/right has inconsistent behavior
> -
>
> Key: SPARK-28036
> URL: https://issues.apache.org/jira/browse/SPARK-28036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {code:sql}
> postgres=# select left('ahoj', -2), right('ahoj', -2);
>  left | right 
> --+---
>  ah   | oj
> (1 row)
> {code}
> Spark SQL:
> {code:sql}
> spark-sql> select left('ahoj', -2), right('ahoj', -2);
> spark-sql>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28164) usage description does not match with shell scripts

2019-06-26 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16873275#comment-16873275
 ] 

Shivu Sondur commented on SPARK-28164:
--

[~hannankan]

i updated the usage message and raised the pull request

> usage description does not match with shell scripts
> ---
>
> Key: SPARK-28164
> URL: https://issues.apache.org/jira/browse/SPARK-28164
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.3
>Reporter: Hanna Kan
>Priority: Major
>
> I found that "spark/sbin/start-slave.sh" may have some error. 
> line 43 gives--- echo "Usage: ./sbin/start-slave.sh [options] "
> but later this script,  I found line 59  MASTER=$1 
> Is this a conflict?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28164) usage description does not match with shell scripts

2019-06-25 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872556#comment-16872556
 ] 

Shivu Sondur commented on SPARK-28164:
--

[~hannankan]

 ./sbin/start-slave.sh  

it will start properly. Tested in master 

> usage description does not match with shell scripts
> ---
>
> Key: SPARK-28164
> URL: https://issues.apache.org/jira/browse/SPARK-28164
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.3
>Reporter: Hanna Kan
>Priority: Major
>
> I found that "spark/sbin/start-slave.sh" may have some error. 
> line 43 gives--- echo "Usage: ./sbin/start-slave.sh [options] "
> but later this script,  I found line 59  MASTER=$1 
> Is this a conflict?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28036) Built-in udf left/right has inconsistent behavior

2019-06-25 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872045#comment-16872045
 ] 

Shivu Sondur edited comment on SPARK-28036 at 6/25/19 8:30 AM:
---

[~yumwang]
 select left('ahoj', 2), right('ahoj', 2);
 use without '-' sign, it will work fine,

i tested in latest spark

 


was (Author: shivuson...@gmail.com):
[~yumwang]
select left('ahoj', 2), right('ahoj', 2);
use with '-' sign, it will work fine,

i tested in latest spark

 

> Built-in udf left/right has inconsistent behavior
> -
>
> Key: SPARK-28036
> URL: https://issues.apache.org/jira/browse/SPARK-28036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {code:sql}
> postgres=# select left('ahoj', -2), right('ahoj', -2);
>  left | right 
> --+---
>  ah   | oj
> (1 row)
> {code}
> Spark SQL:
> {code:sql}
> spark-sql> select left('ahoj', -2), right('ahoj', -2);
> spark-sql>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28036) Built-in udf left/right has inconsistent behavior

2019-06-25 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16872045#comment-16872045
 ] 

Shivu Sondur commented on SPARK-28036:
--

[~yumwang]
select left('ahoj', 2), right('ahoj', 2);
use with '-' sign, it will work fine,

i tested in latest spark

 

> Built-in udf left/right has inconsistent behavior
> -
>
> Key: SPARK-28036
> URL: https://issues.apache.org/jira/browse/SPARK-28036
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> PostgreSQL:
> {code:sql}
> postgres=# select left('ahoj', -2), right('ahoj', -2);
>  left | right 
> --+---
>  ah   | oj
> (1 row)
> {code}
> Spark SQL:
> {code:sql}
> spark-sql> select left('ahoj', -2), right('ahoj', -2);
> spark-sql>
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27722) Remove UnsafeKeyValueSorter

2019-05-15 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840881#comment-16840881
 ] 

Shivu Sondur commented on SPARK-27722:
--

[~viirya]

I also verified it looks this class is not required,

I ran all UT locally, all are passing after removing this class.

> Remove UnsafeKeyValueSorter
> ---
>
> Key: SPARK-27722
> URL: https://issues.apache.org/jira/browse/SPARK-27722
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Minor
>
> We just moved the location of classes including {{UnsafeKeyValueSorter}}. 
> After further investigating, I don't find where {{UnsafeKeyValueSorter}} is 
> used.
> If it is not used at all, shall we just remove it from codebase? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-27409) Micro-batch support for Kafka Source in Spark 2.3

2019-04-23 Thread Shivu Sondur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-27409:
-
Comment: was deleted

(was: i am checking this)

> Micro-batch support for Kafka Source in Spark 2.3
> -
>
> Key: SPARK-27409
> URL: https://issues.apache.org/jira/browse/SPARK-27409
> Project: Spark
>  Issue Type: Question
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Prabhjot Singh Bharaj
>Priority: Major
>
> It seems with this change - 
> [https://github.com/apache/spark/commit/0a441d2edb0a3f6c6c7c370db8917e1c07f211e7#diff-eeac5bdf3a1ecd7b9f8aaf10fff37f05R50]
>  in Spark 2.3 for Kafka Source Provider, a Kafka source can not be run in 
> micro-batch mode but only in continuous mode. Is that understanding correct ?
> {code:java}
> E Py4JJavaError: An error occurred while calling o217.load.
> E : org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:717)
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:566)
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:549)
> E at 
> org.apache.spark.sql.kafka010.SubscribeStrategy.createConsumer(ConsumerStrategy.scala:62)
> E at 
> org.apache.spark.sql.kafka010.KafkaOffsetReader.createConsumer(KafkaOffsetReader.scala:314)
> E at 
> org.apache.spark.sql.kafka010.KafkaOffsetReader.(KafkaOffsetReader.scala:78)
> E at 
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:130)
> E at 
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:43)
> E at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:185)
> E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> E at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E at java.lang.reflect.Method.invoke(Method.java:498)
> E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> E at py4j.Gateway.invoke(Gateway.java:282)
> E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> E at py4j.commands.CallCommand.execute(CallCommand.java:79)
> E at py4j.GatewayConnection.run(GatewayConnection.java:238)
> E at java.lang.Thread.run(Thread.java:748)
> E Caused by: org.apache.kafka.common.KafkaException: 
> org.apache.kafka.common.KafkaException: java.io.FileNotFoundException: 
> non-existent (No such file or directory)
> E at 
> org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:44)
> E at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:93)
> E at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:51)
> E at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:84)
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:657)
> E ... 19 more
> E Caused by: org.apache.kafka.common.KafkaException: 
> java.io.FileNotFoundException: non-existent (No such file or directory)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:121)
> E at 
> org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:41)
> E ... 23 more
> E Caused by: java.io.FileNotFoundException: non-existent (No such file or 
> directory)
> E at java.io.FileInputStream.open0(Native Method)
> E at java.io.FileInputStream.open(FileInputStream.java:195)
> E at java.io.FileInputStream.(FileInputStream.java:138)
> E at java.io.FileInputStream.(FileInputStream.java:93)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory$SecurityStore.load(SslFactory.java:216)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory$SecurityStore.access$000(SslFactory.java:201)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory.createSSLContext(SslFactory.java:137)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:119)
> E ... 24 more{code}
>  When running a simple data stream loader for kafka without an SSL cert, it 
> goes through this code block - 
>  
> {code:java}
> ...
> ...
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:130)
> E at 
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:43)
> E at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:185)
> ...
> ...{code}
>  
> Note that I haven't selected `trigger=continuous...` when creating 

[jira] [Issue Comment Deleted] (SPARK-27421) RuntimeException when querying a view on a partitioned parquet table

2019-04-23 Thread Shivu Sondur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-27421:
-
Comment: was deleted

(was: i am checking this issue)

> RuntimeException when querying a view on a partitioned parquet table
> 
>
> Key: SPARK-27421
> URL: https://issues.apache.org/jira/browse/SPARK-27421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1
> Environment: Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit 
> Server VM, Java 1.8.0_141)
>Reporter: Eric Maynard
>Priority: Minor
>
> When running a simple query, I get the following stacktrace:
> {code}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:686)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1268)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:262)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:957)
>  at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
>  at scala.collection.immutable.List.foreach(List.scala:392)
>  at 
> 

[jira] [Commented] (SPARK-27542) SparkHadoopWriter doesn't set call setWorkOutputPath, causing NPEs when using certain legacy OutputFormats

2019-04-23 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16823844#comment-16823844
 ] 

Shivu Sondur commented on SPARK-27542:
--

[~joshrosen]

Can you provide the exact steps to reproduce this?

> SparkHadoopWriter doesn't set call setWorkOutputPath, causing NPEs when using 
> certain legacy OutputFormats
> --
>
> Key: SPARK-27542
> URL: https://issues.apache.org/jira/browse/SPARK-27542
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> In Hadoop MapReduce, tasks call {{FileOutputFormat.setWorkOutputPath()}} 
> after configuring the  output committer: 
> [https://github.com/apache/hadoop/blob/a55d6bba71c81c1c4e9d8cd11f55c78f10a548b0/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/Task.java#L611]
>  
> Spark doesn't do this: 
> [https://github.com/apache/spark/blob/2d085c13b7f715dbff23dd1f81af45ff903d1a79/core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala#L115]
> As a result, certain legacy output formats can fail to work out-of-the-box on 
> Spark. In particular, 
> {{org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat}} can fail 
> with NullPointerExceptions, e.g.
> {code:java}
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:105)
>   at org.apache.hadoop.fs.Path.(Path.java:94)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetOutputFormat.getDefaultWorkFile(DeprecatedParquetOutputFormat.java:69)
> [...]
>   at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:96)
> {code}
> It looks like someone on GitHub has hit the same problem: 
> https://gist.github.com/themodernlife/e3b07c23ba978f6cc98b73e3f3609abe
> Tez had a very similar bug: https://issues.apache.org/jira/browse/TEZ-3348
> We might be able to fix this by having Spark mimic Hadoop's logic. I'm unsure 
> of whether that change would pose compatibility risks for other existing 
> workloads, though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27532) Correct the default value in the Documentation for "spark.redaction.regex"

2019-04-20 Thread Shivu Sondur (JIRA)
Shivu Sondur created SPARK-27532:


 Summary: Correct the default value in the Documentation for 
"spark.redaction.regex"
 Key: SPARK-27532
 URL: https://issues.apache.org/jira/browse/SPARK-27532
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Shivu Sondur


Correct the default value in the Documentation for "spark.redaction.regex".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-27421) RuntimeException when querying a view on a partitioned parquet table

2019-04-20 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816851#comment-16816851
 ] 

Shivu Sondur edited comment on SPARK-27421 at 4/21/19 1:30 AM:
---

[~emaynard]

Issue I am able to reproduce.

FYI: without "stored as parquet" also 

only 
 spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
 works

But following query will fail, 
 spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
  


was (Author: shivuson...@gmail.com):
[~emaynard]

Issue I am able to reproduce.

FYI: with "stored as parquet" also 

only 
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
works

But following query will fail, 
spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
 

> RuntimeException when querying a view on a partitioned parquet table
> 
>
> Key: SPARK-27421
> URL: https://issues.apache.org/jira/browse/SPARK-27421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1
> Environment: Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit 
> Server VM, Java 1.8.0_141)
>Reporter: Eric Maynard
>Priority: Minor
>
> When running a simple query, I get the following stacktrace:
> {code}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:686)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1268)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:262)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:957)
>  at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>  at 
> 

[jira] [Created] (SPARK-27464) Add Constant instead of referring string literal used from many places

2019-04-15 Thread Shivu Sondur (JIRA)
Shivu Sondur created SPARK-27464:


 Summary:  Add Constant instead of referring string literal used 
from many places
 Key: SPARK-27464
 URL: https://issues.apache.org/jira/browse/SPARK-27464
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.1
Reporter: Shivu Sondur


Add Constant instead of referring string literal used from many places for 
"spark.buffer.pageSize"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27421) RuntimeException when querying a view on a partitioned parquet table

2019-04-13 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816851#comment-16816851
 ] 

Shivu Sondur commented on SPARK-27421:
--

Issue I am able to reproduce.

FYI: with "stored as parquet" also 

only 
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
works

But following query will fail, 
spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
 

> RuntimeException when querying a view on a partitioned parquet table
> 
>
> Key: SPARK-27421
> URL: https://issues.apache.org/jira/browse/SPARK-27421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1
> Environment: Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit 
> Server VM, Java 1.8.0_141)
>Reporter: Eric Maynard
>Priority: Minor
>
> When running a simple query, I get the following stacktrace:
> {code}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:686)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1268)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:262)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:957)
>  at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>  at 
> 

[jira] [Comment Edited] (SPARK-27421) RuntimeException when querying a view on a partitioned parquet table

2019-04-13 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816851#comment-16816851
 ] 

Shivu Sondur edited comment on SPARK-27421 at 4/13/19 6:49 AM:
---

[~emaynard]

Issue I am able to reproduce.

FYI: with "stored as parquet" also 

only 
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
works

But following query will fail, 
spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
 


was (Author: shivuson...@gmail.com):
Issue I am able to reproduce.

FYI: with "stored as parquet" also 

only 
spark.sql("SELECT * FROM test_view WHERE id = '0'").explain
works

But following query will fail, 
spark.sql("SELECT * FROM test_view WHERE id = '0'").show(false)
 

> RuntimeException when querying a view on a partitioned parquet table
> 
>
> Key: SPARK-27421
> URL: https://issues.apache.org/jira/browse/SPARK-27421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1
> Environment: Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit 
> Server VM, Java 1.8.0_141)
>Reporter: Eric Maynard
>Priority: Minor
>
> When running a simple query, I get the following stacktrace:
> {code}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:686)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1268)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:262)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:957)
>  at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
>  at 
> 

[jira] [Commented] (SPARK-27421) RuntimeException when querying a view on a partitioned parquet table

2019-04-09 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813883#comment-16813883
 ] 

Shivu Sondur commented on SPARK-27421:
--

i am checking this issue

> RuntimeException when querying a view on a partitioned parquet table
> 
>
> Key: SPARK-27421
> URL: https://issues.apache.org/jira/browse/SPARK-27421
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit 
> Server VM, Java 1.8.0_141)
>Reporter: Eric Maynard
>Priority: Minor
>
> When running a simple query, I get the following stacktrace:
> {code}
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive. You can set the Spark configuration 
> setting spark.sql.hive.manageFilesourcePartitions to false to work around 
> this problem, however this will result in degraded performance. Please report 
> a bug: https://issues.apache.org/jira/browse/SPARK
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:772)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:686)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:221)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:220)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:266)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:684)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1268)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1261)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:262)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:957)
>  at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:256)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:255)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>  at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:87)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:84)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>  at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>  at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:84)
>  at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:76)
>  at scala.collection.immutable.List.foreach(List.scala:392)
>  at 
> 

[jira] [Commented] (SPARK-27409) Micro-batch support for Kafka Source in Spark 2.3

2019-04-08 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812991#comment-16812991
 ] 

Shivu Sondur commented on SPARK-27409:
--

i am checking this

> Micro-batch support for Kafka Source in Spark 2.3
> -
>
> Key: SPARK-27409
> URL: https://issues.apache.org/jira/browse/SPARK-27409
> Project: Spark
>  Issue Type: Question
>  Components: Structured Streaming
>Affects Versions: 2.3.2
>Reporter: Prabhjot Singh Bharaj
>Priority: Major
>
> It seems with this change - 
> [https://github.com/apache/spark/commit/0a441d2edb0a3f6c6c7c370db8917e1c07f211e7#diff-eeac5bdf3a1ecd7b9f8aaf10fff37f05R50]
>  in Spark 2.3 for Kafka Source Provider, a Kafka source can not be run in 
> micro-batch mode but only in continuous mode. Is that understanding correct ?
> {code:java}
> E Py4JJavaError: An error occurred while calling o217.load.
> E : org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:717)
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:566)
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:549)
> E at 
> org.apache.spark.sql.kafka010.SubscribeStrategy.createConsumer(ConsumerStrategy.scala:62)
> E at 
> org.apache.spark.sql.kafka010.KafkaOffsetReader.createConsumer(KafkaOffsetReader.scala:314)
> E at 
> org.apache.spark.sql.kafka010.KafkaOffsetReader.(KafkaOffsetReader.scala:78)
> E at 
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:130)
> E at 
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:43)
> E at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:185)
> E at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> E at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> E at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> E at java.lang.reflect.Method.invoke(Method.java:498)
> E at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> E at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> E at py4j.Gateway.invoke(Gateway.java:282)
> E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> E at py4j.commands.CallCommand.execute(CallCommand.java:79)
> E at py4j.GatewayConnection.run(GatewayConnection.java:238)
> E at java.lang.Thread.run(Thread.java:748)
> E Caused by: org.apache.kafka.common.KafkaException: 
> org.apache.kafka.common.KafkaException: java.io.FileNotFoundException: 
> non-existent (No such file or directory)
> E at 
> org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:44)
> E at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:93)
> E at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:51)
> E at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:84)
> E at 
> org.apache.kafka.clients.consumer.KafkaConsumer.(KafkaConsumer.java:657)
> E ... 19 more
> E Caused by: org.apache.kafka.common.KafkaException: 
> java.io.FileNotFoundException: non-existent (No such file or directory)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:121)
> E at 
> org.apache.kafka.common.network.SslChannelBuilder.configure(SslChannelBuilder.java:41)
> E ... 23 more
> E Caused by: java.io.FileNotFoundException: non-existent (No such file or 
> directory)
> E at java.io.FileInputStream.open0(Native Method)
> E at java.io.FileInputStream.open(FileInputStream.java:195)
> E at java.io.FileInputStream.(FileInputStream.java:138)
> E at java.io.FileInputStream.(FileInputStream.java:93)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory$SecurityStore.load(SslFactory.java:216)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory$SecurityStore.access$000(SslFactory.java:201)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory.createSSLContext(SslFactory.java:137)
> E at 
> org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:119)
> E ... 24 more{code}
>  When running a simple data stream loader for kafka without an SSL cert, it 
> goes through this code block - 
>  
> {code:java}
> ...
> ...
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:130)
> E at 
> org.apache.spark.sql.kafka010.KafkaSourceProvider.createContinuousReader(KafkaSourceProvider.scala:43)
> E at 
> org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:185)
> ...
> ...{code}
>  
> Note that I haven't selected `trigger=continuous...` when 

[jira] [Commented] (SPARK-24388) EventLoop's run method don't handle fatal error, causes driver hang forever

2019-03-25 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801273#comment-16801273
 ] 

Shivu Sondur commented on SPARK-24388:
--

[~advancedxy]
are you still checking this issue?
is it possible to more details about this issue? 
and if you are not looking, i can check 

> EventLoop's run method don't handle fatal error, causes driver hang forever
> ---
>
> Key: SPARK-24388
> URL: https://issues.apache.org/jira/browse/SPARK-24388
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1, 2.3.0
>Reporter: Xianjin YE
>Priority: Major
>
> Once a fatal error(such as NoSuchMethodError) happens during 
> `onReceive(event)`, the eventThread thread will exist. However the eventQueue 
> is still accepting events. The whole spark application will hang forever.
>  
> cc [~zsxwing] [~XuanYuan]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27263) [HistoryServer]The "Hadoop Properties" in Environment page is coming empty for older version of event logs

2019-03-24 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1687#comment-1687
 ] 

Shivu Sondur commented on SPARK-27263:
--

i am checking this issue

> [HistoryServer]The "Hadoop Properties" in Environment page is coming empty 
> for older version of event logs
> --
>
> Key: SPARK-27263
> URL: https://issues.apache.org/jira/browse/SPARK-27263
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Trivial
>
> #  in spark 2.4 or older version, create entry in eventLog directory by 
> running an application. If we enable the history server it will create the 
> entry
>  # now try to open the same event Log entry in the spark 3.0, history UI, in 
> "Environment Page", in "hadoop properties" tab it is showing empty table as 
> show in bow 
>   
>  # !image-2019-03-24-17-26-01-314.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-23 Thread Shivu Sondur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-27256:
-
Comment: was deleted

(was: i am working on it)

> If the configuration is used to set the number of bytes, we'd better use 
> `bytesConf`'.
> --
>
> Key: SPARK-27256
> URL: https://issues.apache.org/jira/browse/SPARK-27256
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> Currently, if we want to configure `spark. sql. files. maxPartitionBytes` to 
> 256 megabytes, we must set  `spark. sql. files. maxPartitionBytes=268435456`, 
> which is very unfriendly to users.
> And if we set it like this:`spark. sql. files. maxPartitionBytes=256M`, we 
> will  encounter this exception:
> _Exception in thread "main" java.lang.IllegalArgumentException: 
> spark.sql.files.maxPartitionBytes should be long, but was 128M_
>     _at 
> org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala:34)_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27256) If the configuration is used to set the number of bytes, we'd better use `bytesConf`'.

2019-03-23 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799589#comment-16799589
 ] 

Shivu Sondur commented on SPARK-27256:
--

i am working on it

> If the configuration is used to set the number of bytes, we'd better use 
> `bytesConf`'.
> --
>
> Key: SPARK-27256
> URL: https://issues.apache.org/jira/browse/SPARK-27256
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: liuxian
>Priority: Minor
>
> Currently, if we want to configure `spark. sql. files. maxPartitionBytes` to 
> 256 megabytes, we must set  `spark. sql. files. maxPartitionBytes=268435456`, 
> which is very unfriendly to users.
> And if we set it like this:`spark. sql. files. maxPartitionBytes=256M`, we 
> will  encounter this exception:
> _Exception in thread "main" java.lang.IllegalArgumentException: 
> spark.sql.files.maxPartitionBytes should be long, but was 128M_
>     _at 
> org.apache.spark.internal.config.ConfigHelpers$.toNumber(ConfigBuilder.scala:34)_



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27235) Remove the Dead code in HashedRelation.scala file from sql core module

2019-03-21 Thread Shivu Sondur (JIRA)
Shivu Sondur created SPARK-27235:


 Summary: Remove the Dead code in HashedRelation.scala file from 
sql core module
 Key: SPARK-27235
 URL: https://issues.apache.org/jira/browse/SPARK-27235
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Shivu Sondur


Remove the Dead code in HashedRelation.scala file from sql core module, This 
code is never be used.

val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
 .getOrElse(new SparkConf().getSizeAsBytes("spark.buffer.pageSize", "16m"))

 

Here the else part is never be executed, as it is always gets value from get 
method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27235) Remove the Dead code in HashedRelation.scala file from sql core module

2019-03-21 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798361#comment-16798361
 ] 

Shivu Sondur commented on SPARK-27235:
--

I will work on this

> Remove the Dead code in HashedRelation.scala file from sql core module
> --
>
> Key: SPARK-27235
> URL: https://issues.apache.org/jira/browse/SPARK-27235
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Shivu Sondur
>Priority: Trivial
>
> Remove the Dead code in HashedRelation.scala file from sql core module, This 
> code is never be used.
> val pageSizeBytes = Option(SparkEnv.get).map(_.memoryManager.pageSizeBytes)
>  .getOrElse(new SparkConf().getSizeAsBytes("spark.buffer.pageSize", "16m"))
>  
> Here the else part is never be executed, as it is always gets value from get 
> method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27156) why is the "http://:18080/static" browse able?

2019-03-13 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792331#comment-16792331
 ] 

Shivu Sondur commented on SPARK-27156:
--

i am working on it

> why is the "http://:18080/static" browse able?
> 
>
> Key: SPARK-27156
> URL: https://issues.apache.org/jira/browse/SPARK-27156
> Project: Spark
>  Issue Type: Question
>  Components: Spark Core, Web UI
>Affects Versions: 1.6.2
>Reporter: Jerry Garcia
>Priority: Minor
> Attachments: Screen Shot 2019-03-14 at 11.46.31 AM.png
>
>
> I would like to know is there a way to disable spark history server /static 
> folder ? Please do refer on the attachment provided. Reason for asking is for 
> security purposes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27090) Deplementing old LEGACY_DRIVER_IDENTIFIER ("")

2019-03-07 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787603#comment-16787603
 ] 

Shivu Sondur commented on SPARK-27090:
--

i am working on this

> Deplementing old LEGACY_DRIVER_IDENTIFIER ("")
> --
>
> Key: SPARK-27090
> URL: https://issues.apache.org/jira/browse/SPARK-27090
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> For legacy reasons LEGACY_DRIVER_IDENTIFIER was checked for a few places 
> along with the new DRIVER_IDENTIFIER ("driver") to decided whether a driver 
> is running or an executor.
> The new DRIVER_IDENTIFIER ("driver") was introduced in spark version 1.4. So 
> I think we have a chance to get rid of  the LEGACY_DRIVER_IDENTIFIER.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27089) Loss of precision during decimal division

2019-03-07 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787374#comment-16787374
 ] 

Shivu Sondur commented on SPARK-27089:
--

i am checking this issue

> Loss of precision during decimal division
> -
>
> Key: SPARK-27089
> URL: https://issues.apache.org/jira/browse/SPARK-27089
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.0
>Reporter: ylo0ztlmtusq
>Priority: Major
>
> Spark looses decimal places when dividing decimal numbers.
>  
> Expected behavior (In Spark 2.2.3 or before)
>  
> {code:java}
> scala> val sql = """select cast(cast(3 as decimal(38,14)) / cast(9 as 
> decimal(38,14)) as decimal(38,14)) val"""
> sql: String = select cast(cast(3 as decimal(38,14)) / cast(9 as 
> decimal(38,14)) as decimal(38,14)) val
> scala> spark.sql(sql).show
> 19/03/07 21:23:51 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> ++
> | val|
> ++
> |0.33|
> ++
> {code}
>  
> Current behavior (In Spark 2.3.2 and later)
>  
> {code:java}
> scala> val sql = """select cast(cast(3 as decimal(38,14)) / cast(9 as 
> decimal(38,14)) as decimal(38,14)) val"""
> sql: String = select cast(cast(3 as decimal(38,14)) / cast(9 as 
> decimal(38,14)) as decimal(38,14)) val
> scala> spark.sql(sql).show
> ++
> | val|
> ++
> |0.33|
> ++
> {code}
>  
> Seems to caused by {{promote_precision(38, 6) }}
>  
> {code:java}
> scala> spark.sql(sql).explain(true)
> == Parsed Logical Plan ==
> Project [cast((cast(3 as decimal(38,14)) / cast(9 as decimal(38,14))) as 
> decimal(38,14)) AS val#20]
> +- OneRowRelation
> == Analyzed Logical Plan ==
> val: decimal(38,14)
> Project [cast(CheckOverflow((promote_precision(cast(cast(3 as decimal(38,14)) 
> as decimal(38,14))) / promote_precision(cast(cast(9 as decimal(38,14)) as 
> decimal(38,14, DecimalType(38,6)) as decimal(38,14)) AS val#20]
> +- OneRowRelation
> == Optimized Logical Plan ==
> Project [0.33 AS val#20]
> +- OneRowRelation
> == Physical Plan ==
> *(1) Project [0.33 AS val#20]
> +- Scan OneRowRelation[]
> {code}
>  
> Source https://stackoverflow.com/q/55046492



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27030) DataFrameWriter.insertInto fails when writing in parallel to a hive table

2019-03-03 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782943#comment-16782943
 ] 

Shivu Sondur commented on SPARK-27030:
--

I am checking this issue

> DataFrameWriter.insertInto fails when writing in parallel to a hive table
> -
>
> Key: SPARK-27030
> URL: https://issues.apache.org/jira/browse/SPARK-27030
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Lev Katzav
>Priority: Major
>
> When writing to a hive table, the following temp directory is used:
> {code:java}
> /path/to/table/_temporary/0/{code}
> (the 0 at the end comes from the config
> {code:java}
> "mapreduce.job.application.attempt.id"{code}
> since that config is missing, it falls back to 0)
> when there are 2 processes that write to the same table, there could be the 
> following race condition:
>  # p1 creates temp folder and uses it
>  # p2 uses temp folder
>  # p1 finishes and deletes temp folder
>  # p2 fails since temp folder is missing
>  
> It is possible to recreate this error locally with the following code:
> (the code runs locally, but I experienced the same error when running on a 
> cluster
> with 2 jobs writing to the same table)
> {code:java}
> import org.apache.spark.sql.functions._
> val df = spark
>  .range(1000)
>  .toDF("a")
>  .withColumn("partition", lit(0))
>  .cache()
> //create db
> sqlContext.sql("CREATE DATABASE IF NOT EXISTS db").count()
> //create table
> df
>  .write
>  .partitionBy("partition")
>  .saveAsTable("db.table")
> val x = (1 to 100).par
> x.tasksupport = new ForkJoinTaskSupport( new ForkJoinPool(10))
> //insert to different partitions in parallel
> x.foreach { p =>
>  val df2 = df
>  .withColumn("partition",lit(p))
>   df2
>.write
>.mode(SaveMode.Overwrite)
>.insertInto("db.table")
> }
> {code}
>  
>  the error would be:
> {code:java}
> java.io.FileNotFoundException: File 
> file:/path/to/warehouse/db.db/table/_temporary/0 does not exist
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:406)
>  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1497)
>  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1537)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:669)
>  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1497)
>  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1537)
>  at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:283)
>  at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:325)
>  at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
>  at 
> org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:166)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:185)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
>  

[jira] [Commented] (SPARK-26872) Use a configurable value for final termination in the JobScheduler.stop() method

2019-02-28 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16781288#comment-16781288
 ] 

Shivu Sondur commented on SPARK-26872:
--

I  am working on this issue

> Use a configurable value for final termination in the JobScheduler.stop() 
> method
> 
>
> Key: SPARK-26872
> URL: https://issues.apache.org/jira/browse/SPARK-26872
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.4.0
>Reporter: Steven Rosenberry
>Priority: Minor
>
> As a user of Spark, I would like to configure the timeout that controls final 
> termination after stopping the streaming context and while processing 
> previously queued jobs.  Currently, there is a hard-coded limit of one hour 
> around line 129 in the JobScheduler.stop() method:
> {code:java}
> // Wait for the queued jobs to complete if indicated
> val terminated = if (processAllReceivedData) {
> jobExecutor.awaitTermination(1, TimeUnit.HOURS) // just a very large period 
> of time
> } else {
> jobExecutor.awaitTermination(2, TimeUnit.SECONDS)
> }
> {code}
> It would provide additional functionality to the Spark platform if this value 
> was configurable.  My use case may take many hours to finish the queued job 
> as it was created from a large data file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26944) Python unit-tests.log not available in artifacts for a build in Jenkins

2019-02-21 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16774725#comment-16774725
 ] 

Shivu Sondur commented on SPARK-26944:
--

[~abellina] 

Can you mention more info, like Jenkins link, where issue is happening and 
other details.

> Python unit-tests.log not available in artifacts for a build in Jenkins
> ---
>
> Key: SPARK-26944
> URL: https://issues.apache.org/jira/browse/SPARK-26944
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Alessandro Bellina
>Priority: Minor
>
> I had a pr where the python unit tests failed.  The tests point at the 
> `/home/jenkins/workspace/SparkPullRequestBuilder/python/unit-tests.log` file, 
> but I can't get to that from jenkins UI it seems (are all prs writing to the 
> same file?).
> This Jira is to make it available under the artifacts for each build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25452) Query with where clause is giving unexpected result in case of float column

2018-10-29 Thread Shivu Sondur (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668135#comment-16668135
 ] 

Shivu Sondur commented on SPARK-25452:
--

Yes I am analyzing the issue.

> Query with where clause is giving unexpected result in case of float column
> ---
>
> Key: SPARK-25452
> URL: https://issues.apache.org/jira/browse/SPARK-25452
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
> Environment: *Spark 2.3.1*
> *Hadoop 2.7.2*
>Reporter: Ayush Anubhava
>Priority: Major
> Attachments: image-2018-09-26-14-14-47-504.png
>
>
> *Description* : Query with clause is giving unexpected result in case of 
> float column 
>  
> {color:#d04437}*Query with filter less than equal to is giving inappropriate 
> result{code}*{color}
> {code}
> 0: jdbc:hive2://10.18.18.214:23040/default> create table k2 ( a int, b float);
> +-+--+
> | Result |
> +-+--+
> +-+--+
> 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values 
> (0,0.0);
> +-+--+
> | Result |
> +-+--+
> +-+--+
> 0: jdbc:hive2://10.18.18.214:23040/default> insert into table k2 values 
> (1,1.1);
> +-+--+
> | Result |
> +-+--+
> +-+--+
> 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b >=0.0;
> +++--+
> | a | b |
> +++--+
> | 0 | 0.0 |
> | 1 | 1.10023841858 |
> +++--+
> Query with filter less than equal to is giving in appropriate result
> 0: jdbc:hive2://10.18.18.214:23040/default> select * from k2 where b <=1.1;
> ++--+--+
> | a | b |
> ++--+--+
> | 0 | 0.0 |
> ++--+--+
> 1 row selected (0.299 seconds)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25683) Updated the log for the firstTime event Drop occurs.

2018-10-18 Thread Shivu Sondur (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-25683:
-
Description: 
{code:xml}
18/10/08 17:51:40 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
This likely means one of the listeners is too slow and cannot keep up with the 
rate at which tasks are being started by the scheduler.
18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events from eventLog since 
Wed Dec 31 16:00:00 PST 1969.
18/10/08 17:52:40 WARN AsyncEventQueue: Dropped 144853 events from eventLog 
since Mon Oct 08 17:51:40 PDT 2018.
{code}

Here it shows the time as Wed Dec 31 16:00:00 PST 1969 for the first log, 
log should updated as  "... since start of the application" if 
'lastReportTimestamp' == 0.
when the  first dropEvent occurs.

  was:
{code:xml}
18/10/08 17:51:40 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
This likely means one of the listeners is too slow and cannot keep up with the 
rate at which tasks are being started by the scheduler.
18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events from eventLog since 
Wed Dec 31 16:00:00 PST 1969.
18/10/08 17:52:40 WARN AsyncEventQueue: Dropped 144853 events from eventLog 
since Mon Oct 08 17:51:40 PDT 2018.
{code}

Here it shows the time as Wed Dec 31 16:00:00 PST 1969 for the first log, I 
think it would be better if we show the initialized time as the time here.

Summary: Updated the log for the firstTime event Drop occurs.  (was: 
Make AsyncEventQueue.lastReportTimestamp inital value as the currentTime 
instead of 0)

> Updated the log for the firstTime event Drop occurs.
> 
>
> Key: SPARK-25683
> URL: https://issues.apache.org/jira/browse/SPARK-25683
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Devaraj K
>Priority: Trivial
>
> {code:xml}
> 18/10/08 17:51:40 ERROR AsyncEventQueue: Dropping event from queue eventLog. 
> This likely means one of the listeners is too slow and cannot keep up with 
> the rate at which tasks are being started by the scheduler.
> 18/10/08 17:51:40 WARN AsyncEventQueue: Dropped 1 events from eventLog since 
> Wed Dec 31 16:00:00 PST 1969.
> 18/10/08 17:52:40 WARN AsyncEventQueue: Dropped 144853 events from eventLog 
> since Mon Oct 08 17:51:40 PDT 2018.
> {code}
> Here it shows the time as Wed Dec 31 16:00:00 PST 1969 for the first log, 
> log should updated as  "... since start of the application" if 
> 'lastReportTimestamp' == 0.
> when the  first dropEvent occurs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org