[jira] [Commented] (SPARK-29598) Support search option for statistics in JDBC/ODBC server page

2019-10-24 Thread Rakesh Raushan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959447#comment-16959447
 ] 

Rakesh Raushan commented on SPARK-29598:


I will raise a PR for this one.

> Support search option for statistics in JDBC/ODBC server page
> -
>
> Key: SPARK-29598
> URL: https://issues.apache.org/jira/browse/SPARK-29598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: jobit mathew
>Priority: Minor
>
> Support search option for statistics in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29598) Support search option for statistics in JDBC/ODBC server page

2019-10-24 Thread jobit mathew (Jira)
jobit mathew created SPARK-29598:


 Summary: Support search option for statistics in JDBC/ODBC server 
page
 Key: SPARK-29598
 URL: https://issues.apache.org/jira/browse/SPARK-29598
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: jobit mathew


Support search option for statistics in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29595) Insertion with named_struct should match by name

2019-10-24 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959442#comment-16959442
 ] 

Shivu Sondur commented on SPARK-29595:
--

i am checking this issue

> Insertion with named_struct should match by name
> 
>
> Key: SPARK-29595
> URL: https://issues.apache.org/jira/browse/SPARK-29595
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Major
>
> {code:java}
> spark-sql> create table str using parquet as(select named_struct('a', 1, 'b', 
> 2) as data);
> spark-sql>  insert into str values named_struct("b", 3, "a", 1);
> spark-sql> select * from str;
> {"a":3,"b":1}
> {"a":1,"b":2}
> {code}
> The result should be 
> {code:java}
> {"a":1,"b":3}
> {"a":1,"b":2}
> {code}
> Spark should match the field names of named_struct on insertion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29498) CatalogTable to HiveTable should not change the table's ownership

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29498:
--
Fix Version/s: 2.4.5

> CatalogTable to HiveTable should not change the table's ownership
> -
>
> Key: SPARK-29498
> URL: https://issues.apache.org/jira/browse/SPARK-29498
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.4
>Reporter: Kent Yao
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
> Attachments: image-2019-10-17-18-00-38-101.png
>
>
> How to reproduce:
> {code:scala}
>   test("CatalogTable to HiveTable should not change the table's ownership") {
> val catalog = newBasicCatalog()
>   val identifier = TableIdentifier("test_table_owner", Some("default"))
>   val owner = "Apache Spark"
>   val newTable = CatalogTable(
> identifier,
> tableType = CatalogTableType.EXTERNAL,
> storage = CatalogStorageFormat(
>   locationUri = None,
>   inputFormat = None,
>   outputFormat = None,
>   serde = None,
>   compressed = false,
>   properties = Map.empty),
> owner = owner,
> schema = new StructType().add("i", "int"),
> provider = Some("hive"))
> catalog.createTable(newTable, false)
> assert(catalog.getTable("default", "test_table_owner").owner === owner)
>   }
> {code}
> {noformat}
> [info] - CatalogTable to HiveTable should not change the table's ownership 
> *** FAILED *** (267 milliseconds)
> [info]   "[yumwang]" did not equal "[Apache Spark]" 
> (HiveExternalCatalogSuite.scala:136)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29562) SQLAppStatusListener metrics aggregation is slow and memory hungry

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29562.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26218
[https://github.com/apache/spark/pull/26218]

> SQLAppStatusListener metrics aggregation is slow and memory hungry
> --
>
> Key: SPARK-29562
> URL: https://issues.apache.org/jira/browse/SPARK-29562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very 
> similar to what it was previously, so I'm sure this is even older.
> Long story short, the aggregation code 
> ({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take 
> a non-trivial amount of time with large queries, aside from using a ton of 
> memory.
> There are also cascading issues caused by that: since it's called from an 
> event handler, it can slow down event processing, causing events to be 
> dropped, which can cause listeners to miss important events that would tell 
> them to free up internal state (and, thus, memory).
> To given an anecdotal example, one app I looked at ran into the "events being 
> dropped" issue, which caused the listener to accumulate state for 100s of 
> live stages, even though most of them were already finished. That lead to a 
> few GB of memory being wasted due to finished stages that were still being 
> tracked.
> Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}} 
> and making it faster. We should look at the other issues (unblocking event 
> processing, cleaning up of stale data in listeners) separately.
> (I also remember someone in the past trying to fix something in this area, 
> but couldn't find a PR nor an open bug.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29562) SQLAppStatusListener metrics aggregation is slow and memory hungry

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29562:
-

Assignee: Marcelo Masiero Vanzin

> SQLAppStatusListener metrics aggregation is slow and memory hungry
> --
>
> Key: SPARK-29562
> URL: https://issues.apache.org/jira/browse/SPARK-29562
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
>
> While {{SQLAppStatusListener}} was added in 2.3, the aggregation code is very 
> similar to what it was previously, so I'm sure this is even older.
> Long story short, the aggregation code 
> ({{SQLAppStatusListener.aggregateMetrics}}) is very, very slow, and can take 
> a non-trivial amount of time with large queries, aside from using a ton of 
> memory.
> There are also cascading issues caused by that: since it's called from an 
> event handler, it can slow down event processing, causing events to be 
> dropped, which can cause listeners to miss important events that would tell 
> them to free up internal state (and, thus, memory).
> To given an anecdotal example, one app I looked at ran into the "events being 
> dropped" issue, which caused the listener to accumulate state for 100s of 
> live stages, even though most of them were already finished. That lead to a 
> few GB of memory being wasted due to finished stages that were still being 
> tracked.
> Here, though, I'd like to focus on {{SQLAppStatusListener.aggregateMetrics}} 
> and making it faster. We should look at the other issues (unblocking event 
> processing, cleaning up of stale data in listeners) separately.
> (I also remember someone in the past trying to fix something in this area, 
> but couldn't find a PR nor an open bug.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Shivu Sondur (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivu Sondur updated SPARK-29596:
-
Comment: was deleted

(was: i am checking this issue)

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Shivu Sondur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959438#comment-16959438
 ] 

Shivu Sondur commented on SPARK-29596:
--

i am checking this issue

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Rakesh Raushan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959436#comment-16959436
 ] 

Rakesh Raushan commented on SPARK-29596:


I will look into this one.

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29597) Deprecate old Java 8 versions prior to 8u92

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29597.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26249
[https://github.com/apache/spark/pull/26249]

> Deprecate old Java 8 versions prior to 8u92
> ---
>
> Key: SPARK-29597
> URL: https://issues.apache.org/jira/browse/SPARK-29597
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29597) Deprecate old Java 8 versions prior to 8u92

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29597:
-

Assignee: Dongjoon Hyun

> Deprecate old Java 8 versions prior to 8u92
> ---
>
> Key: SPARK-29597
> URL: https://issues.apache.org/jira/browse/SPARK-29597
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29597) Deprecate old Java 8 versions prior to 8u92

2019-10-24 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-29597:
-

 Summary: Deprecate old Java 8 versions prior to 8u92
 Key: SPARK-29597
 URL: https://issues.apache.org/jira/browse/SPARK-29597
 Project: Spark
  Issue Type: Task
  Components: Documentation
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-24 Thread Kevin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959368#comment-16959368
 ] 

Kevin commented on SPARK-29106:
---

Hi all, I'm Kevin Zhao from Linaro Developer Cloud, we've offered the resources 
for Aarch64 CI. Thanks for using our resources.

I'm sorry that yesterday night we have a network upgrade in Linaro Developer 
Cloud, and it lead to the network problems for the vm instances you use. Now 
the problem has been fixed. Pls try again if possible.

In the future if there is an upgrading, I will notify OpenLab first, in case of 
the convenient conditions. Thanks very much!

 

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-24 Thread huangtianhua (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959355#comment-16959355
 ] 

huangtianhua commented on SPARK-29106:
--

Notes: Hadoop has supported leveldbjni of aarch64 platform, see 
https://issues.apache.org/jira/browse/HADOOP-16614

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29227) Track rule info in optimization phase

2019-10-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29227:


Assignee: wenxuanguan

> Track rule info in optimization phase 
> --
>
> Key: SPARK-29227
> URL: https://issues.apache.org/jira/browse/SPARK-29227
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: wenxuanguan
>Assignee: wenxuanguan
>Priority: Major
>
> Track timing info for each rule in optimization phase using 
> QueryPlanningTracker in Structured Streaming



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29227) Track rule info in optimization phase

2019-10-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29227.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25914
[https://github.com/apache/spark/pull/25914]

> Track rule info in optimization phase 
> --
>
> Key: SPARK-29227
> URL: https://issues.apache.org/jira/browse/SPARK-29227
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: wenxuanguan
>Assignee: wenxuanguan
>Priority: Major
> Fix For: 3.0.0
>
>
> Track timing info for each rule in optimization phase using 
> QueryPlanningTracker in Structured Streaming



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959243#comment-16959243
 ] 

Dongjoon Hyun commented on SPARK-29586:
---

Thank you for filing a JIRA, [~726575...@qq.com]. For new features, we need to 
use `3.0.0` as `Affected Versions` because Apache Spark doesn't allow new 
feature backporting.

> spark jdbc method param lowerBound and upperBound DataType wrong
> 
>
> Key: SPARK-29586
> URL: https://issues.apache.org/jira/browse/SPARK-29586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: daile
>Priority: Major
>
>  
> {code:java}
> private def toBoundValueInWhereClause(
> value: Long,
> columnType: DataType,
> timeZoneId: String): String = {
>   def dateTimeToString(): String = {
> val dateTimeStr = columnType match {
>   case DateType => DateFormatter().format(value.toInt)
>   case TimestampType =>
> val timestampFormatter = TimestampFormatter.getFractionFormatter(
>   DateTimeUtils.getZoneId(timeZoneId))
> DateTimeUtils.timestampToString(timestampFormatter, value)
> }
> s"'$dateTimeStr'"
>   }
>   columnType match {
> case _: NumericType => value.toString
> case DateType | TimestampType => dateTimeToString()
>   }
> }{code}
> partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc 
> method only accept Long
>  
> {code:java}
> test("jdbc Suite2") {
>   val df = spark
> .read
> .option("partitionColumn", "B")
> .option("lowerBound", "2017-01-01 10:00:00")
> .option("upperBound", "2019-01-01 10:00:00")
> .option("numPartitions", 5)
> .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
>   df.printSchema()
>   df.show()
> }
> {code}
> it's OK
>  
> {code:java}
> test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, 
> "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) 
> df.printSchema() df.show() }
> {code}
>  
> {code:java}
> java.lang.IllegalArgumentException: Cannot parse the bound value 
> 1571899768024 as datejava.lang.IllegalArgumentException: Cannot parse the 
> bound value 1571899768024 as date at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) 
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at 
> org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at 
> org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at 
> org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at 
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at 
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at 
> org.scalatest.Transformer.apply(Transformer.scala:22) at 
> org.scalatest.Transformer.apply(Transformer.scala:20) at 
> org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at 
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at 
> org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at 
> org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
>  at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at 
> org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43)
>  at 

[jira] [Updated] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29586:
--
Affects Version/s: (was: 2.4.4)

> spark jdbc method param lowerBound and upperBound DataType wrong
> 
>
> Key: SPARK-29586
> URL: https://issues.apache.org/jira/browse/SPARK-29586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: daile
>Priority: Major
>
>  
> {code:java}
> private def toBoundValueInWhereClause(
> value: Long,
> columnType: DataType,
> timeZoneId: String): String = {
>   def dateTimeToString(): String = {
> val dateTimeStr = columnType match {
>   case DateType => DateFormatter().format(value.toInt)
>   case TimestampType =>
> val timestampFormatter = TimestampFormatter.getFractionFormatter(
>   DateTimeUtils.getZoneId(timeZoneId))
> DateTimeUtils.timestampToString(timestampFormatter, value)
> }
> s"'$dateTimeStr'"
>   }
>   columnType match {
> case _: NumericType => value.toString
> case DateType | TimestampType => dateTimeToString()
>   }
> }{code}
> partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc 
> method only accept Long
>  
> {code:java}
> test("jdbc Suite2") {
>   val df = spark
> .read
> .option("partitionColumn", "B")
> .option("lowerBound", "2017-01-01 10:00:00")
> .option("upperBound", "2019-01-01 10:00:00")
> .option("numPartitions", 5)
> .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
>   df.printSchema()
>   df.show()
> }
> {code}
> it's OK
>  
> {code:java}
> test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, 
> "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) 
> df.printSchema() df.show() }
> {code}
>  
> {code:java}
> java.lang.IllegalArgumentException: Cannot parse the bound value 
> 1571899768024 as datejava.lang.IllegalArgumentException: Cannot parse the 
> bound value 1571899768024 as date at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) 
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at 
> org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at 
> org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at 
> org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at 
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at 
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at 
> org.scalatest.Transformer.apply(Transformer.scala:22) at 
> org.scalatest.Transformer.apply(Transformer.scala:20) at 
> org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at 
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at 
> org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at 
> org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
>  at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at 
> org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43)
>  at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at 
> org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at 
> 

[jira] [Assigned] (SPARK-29526) UNCACHE TABLE should look up catalog/table like v2 commands

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-29526:
-

Assignee: Terry Kim

> UNCACHE TABLE should look up catalog/table like v2 commands
> ---
>
> Key: SPARK-29526
> URL: https://issues.apache.org/jira/browse/SPARK-29526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
>
> UNCACHE TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29526) UNCACHE TABLE should look up catalog/table like v2 commands

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29526.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26237
[https://github.com/apache/spark/pull/26237]

> UNCACHE TABLE should look up catalog/table like v2 commands
> ---
>
> Key: SPARK-29526
> URL: https://issues.apache.org/jira/browse/SPARK-29526
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>
> UNCACHE TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Bharati Jadhav (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharati Jadhav updated SPARK-29596:
---
Attachment: Screenshot_Spark_live_WebUI.png

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screenshot_Spark_live_WebUI.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Bharati Jadhav (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharati Jadhav updated SPARK-29596:
---
Attachment: (was: Screen Shot 2019-10-14 at 4.23.49 PM.png)

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Bharati Jadhav (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharati Jadhav updated SPARK-29596:
---
Attachment: Screen Shot 2019-10-14 at 4.23.49 PM.png

> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
> Attachments: Screen Shot 2019-10-14 at 4.23.49 PM.png
>
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Bharati Jadhav (Jira)
Bharati Jadhav created SPARK-29596:
--

 Summary: Task duration not updating for running tasks
 Key: SPARK-29596
 URL: https://issues.apache.org/jira/browse/SPARK-29596
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.2
Reporter: Bharati Jadhav


When looking at the task metrics for running tasks in the task table for the 
related stage, the duration column is not updated until the task has succeeded. 
The duration values are reported empty or 0 ms until the task has completed. 
This is a change in behavior, from earlier versions, when the task duration was 
continuously updated while the task was running. The missing duration values 
can be observed for both short and long running tasks and for multiple 
applications.

 

To reproduce this, one can run any code from the spark-shell and observe the 
missing duration values for any running task. Only when the task succeeds is 
the duration value populated in the UI.

!image-2019-10-24-14-16-16-986.png!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29596) Task duration not updating for running tasks

2019-10-24 Thread Bharati Jadhav (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bharati Jadhav updated SPARK-29596:
---
Description: 
When looking at the task metrics for running tasks in the task table for the 
related stage, the duration column is not updated until the task has succeeded. 
The duration values are reported empty or 0 ms until the task has completed. 
This is a change in behavior, from earlier versions, when the task duration was 
continuously updated while the task was running. The missing duration values 
can be observed for both short and long running tasks and for multiple 
applications.

 

To reproduce this, one can run any code from the spark-shell and observe the 
missing duration values for any running task. Only when the task succeeds is 
the duration value populated in the UI.

 

 

  was:
When looking at the task metrics for running tasks in the task table for the 
related stage, the duration column is not updated until the task has succeeded. 
The duration values are reported empty or 0 ms until the task has completed. 
This is a change in behavior, from earlier versions, when the task duration was 
continuously updated while the task was running. The missing duration values 
can be observed for both short and long running tasks and for multiple 
applications.

 

To reproduce this, one can run any code from the spark-shell and observe the 
missing duration values for any running task. Only when the task succeeds is 
the duration value populated in the UI.

!image-2019-10-24-14-16-16-986.png!

 


> Task duration not updating for running tasks
> 
>
> Key: SPARK-29596
> URL: https://issues.apache.org/jira/browse/SPARK-29596
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.2
>Reporter: Bharati Jadhav
>Priority: Major
>
> When looking at the task metrics for running tasks in the task table for the 
> related stage, the duration column is not updated until the task has 
> succeeded. The duration values are reported empty or 0 ms until the task has 
> completed. This is a change in behavior, from earlier versions, when the task 
> duration was continuously updated while the task was running. The missing 
> duration values can be observed for both short and long running tasks and for 
> multiple applications.
>  
> To reproduce this, one can run any code from the spark-shell and observe the 
> missing duration values for any running task. Only when the task succeeds is 
> the duration value populated in the UI.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-24 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959204#comment-16959204
 ] 

Shane Knapp commented on SPARK-29106:
-

i bumped the git timeout to 30mins, which is a much more obfuscated set of 
tasks than i ever would have imagined lol...

relaunched the job and let's see if it fetches/clones in time.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15348) Hive ACID

2019-10-24 Thread Georg Heiler (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959185#comment-16959185
 ] 

Georg Heiler commented on SPARK-15348:
--

To my best knowledge, EXTERNAL tables are not faster - they are simply not 
managed.

Oftentimes, this means if the writing application is not taking care of 
sorting, indexing and handling the small files problem - indeed, this is 
slower. But if you properly handle these using spark there should not be a 
difference.

The metadata is anyhow stored in hive.

> Hive ACID
> -
>
> Key: SPARK-15348
> URL: https://issues.apache.org/jira/browse/SPARK-15348
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
>Reporter: Ran Haim
>Priority: Major
>
> Spark does not support any feature of hive's transnational tables,
> you cannot use spark to delete/update a table and it also has problems 
> reading the aggregated data when no compaction was done.
> Also it seems that compaction is not supported - alter table ... partition 
>  COMPACT 'major'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29472) Mechanism for Excluding Jars at Launch for YARN

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29472.

Resolution: Won't Fix

> Mechanism for Excluding Jars at Launch for YARN
> ---
>
> Key: SPARK-29472
> URL: https://issues.apache.org/jira/browse/SPARK-29472
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Affects Versions: 2.4.4
>Reporter: Abhishek Modi
>Priority: Minor
>
> *Summary*
> It would be convenient if there were an easy way to exclude jars from Spark’s 
> classpath at launch time. This would complement the way in which jars can be 
> added to the classpath using {{extraClassPath}}.
>  
> *Context*
> The Spark build contains its dependency jars in the {{/jars}} directory. 
> These jars become part of the executor’s classpath. By default on YARN, these 
> jars are packaged and distributed to containers at launch ({{spark-submit}}) 
> time.
>  
> While developing Spark applications, customers sometimes need to debug using 
> different versions of dependencies. This can become difficult if the 
> dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} 
> (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is 
> preferentially loaded. 
>  
> Configurations such as {{userClassPathFirst}} are available. However these 
> have often come with other side effects. For example, if the customer’s build 
> includes Avro they will likely see {{Caused by: java.lang.LinkageError: 
> loader constraint violation: when resolving method 
> "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
>  the class loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) of the current class, 
> com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance 
> of sun/misc/Launcher$AppClassLoader) for the method's defining class, 
> org/apache/spark/SparkConf, have different Class objects for the type 
> scala/collection/Seq used in the signature}}. Resolving such issues often 
> takes many hours.
>  
> To deal with these sorts of issues, customers often download the Spark build, 
> remove the target jars and then do spark-submit. Other times, customers may 
> not be able to do spark-submit as it is gated behind some Spark Job Server. 
> In this case, customers may try downloading the build, removing the jars, and 
> then using configurations such as {{spark.yarn.dist.jars}} or 
> {{spark.yarn.dist.archives}}. Both of these options are undesirable as they 
> are very operationally heavy, error prone and often result in the customer’s 
> spark builds going out of sync with the authoritative build. 
>  
> *Solution*
> I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} 
> configuration. Customers could provide a regex such as {{.\*parquet.\*}} and 
> jar files matching this regex would not be included in the driver and 
> executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29472) Mechanism for Excluding Jars at Launch for YARN

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959179#comment-16959179
 ] 

Marcelo Masiero Vanzin commented on SPARK-29472:


bq. customers sometimes need to debug using different versions of dependencies

That's trivial to do with Spark-on-YARN.

{code}
spark-submit --deploy-mode cluster \
  --files /path/to/my-custom-parquet.jar \
  --conf spark.driver.extraClassPath=my-custom-parquet.jar \
  --conf spark.executor.extraClassPath=my-custom-parquet.jar
{code}

Or in client mode:

{code}
spark-submit --deploy-mode cluster \
  --files /path/to/my-custom-parquet.jar \
  --conf spark.driver.extraClassPath=/path/to/my-custom-parquet.jar \
  --conf spark.executor.extraClassPath=my-custom-parquet.jar
{code}

Done. No need for a new option, no need to change Spark's install directory, no 
need for {{userClassPathFirst}} or anything. I don't see the point of adding 
the new option - it's confusing, easy to break things, and doesn't completely 
solve the problem by itself, since you still have to upload the new jar and add 
it to the class path with other existing options.

> Mechanism for Excluding Jars at Launch for YARN
> ---
>
> Key: SPARK-29472
> URL: https://issues.apache.org/jira/browse/SPARK-29472
> Project: Spark
>  Issue Type: New Feature
>  Components: YARN
>Affects Versions: 2.4.4
>Reporter: Abhishek Modi
>Priority: Minor
>
> *Summary*
> It would be convenient if there were an easy way to exclude jars from Spark’s 
> classpath at launch time. This would complement the way in which jars can be 
> added to the classpath using {{extraClassPath}}.
>  
> *Context*
> The Spark build contains its dependency jars in the {{/jars}} directory. 
> These jars become part of the executor’s classpath. By default on YARN, these 
> jars are packaged and distributed to containers at launch ({{spark-submit}}) 
> time.
>  
> While developing Spark applications, customers sometimes need to debug using 
> different versions of dependencies. This can become difficult if the 
> dependency (eg. Parquet 1.11.0) is one that Spark already has in {{/jars}} 
> (eg. Parquet 1.10.1 in Spark 2.4), as the dependency included with Spark is 
> preferentially loaded. 
>  
> Configurations such as {{userClassPathFirst}} are available. However these 
> have often come with other side effects. For example, if the customer’s build 
> includes Avro they will likely see {{Caused by: java.lang.LinkageError: 
> loader constraint violation: when resolving method 
> "org.apache.spark.SparkConf.registerAvroSchemas(Lscala/collection/Seq;)Lorg/apache/spark/SparkConf;"
>  the class loader (instance of 
> org/apache/spark/util/ChildFirstURLClassLoader) of the current class, 
> com/uber/marmaray/common/spark/SparkFactory, and the class loader (instance 
> of sun/misc/Launcher$AppClassLoader) for the method's defining class, 
> org/apache/spark/SparkConf, have different Class objects for the type 
> scala/collection/Seq used in the signature}}. Resolving such issues often 
> takes many hours.
>  
> To deal with these sorts of issues, customers often download the Spark build, 
> remove the target jars and then do spark-submit. Other times, customers may 
> not be able to do spark-submit as it is gated behind some Spark Job Server. 
> In this case, customers may try downloading the build, removing the jars, and 
> then using configurations such as {{spark.yarn.dist.jars}} or 
> {{spark.yarn.dist.archives}}. Both of these options are undesirable as they 
> are very operationally heavy, error prone and often result in the customer’s 
> spark builds going out of sync with the authoritative build. 
>  
> *Solution*
> I’d like to propose adding a {{spark.yarn.jars.exclusionRegex}} 
> configuration. Customers could provide a regex such as {{.\*parquet.\*}} and 
> jar files matching this regex would not be included in the driver and 
> executor classpath.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21287) Cannot use Int.MIN_VALUE as Spark SQL fetchsize

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-21287:
-

Assignee: Hu Fuwang

> Cannot use Int.MIN_VALUE as Spark SQL fetchsize
> ---
>
> Key: SPARK-21287
> URL: https://issues.apache.org/jira/browse/SPARK-21287
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Hu Fuwang
>Priority: Major
>
> MySQL JDBC driver gives possibility to not store ResultSet in memory.
> We can do this by setting fetchSize to Int.MIN_VALUE.
> Unfortunately this configuration isn't correct in Spark.
> {code}
> java.lang.IllegalArgumentException: requirement failed: Invalid value 
> `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the 
> value is 0, the JDBC driver ignores the value and does the estimates.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:34)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206)
>   at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:214)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21287) Cannot use Int.MIN_VALUE as Spark SQL fetchsize

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-21287.
---
Fix Version/s: 3.0.0
   2.4.5
   Resolution: Fixed

Issue resolved by pull request 26244
[https://github.com/apache/spark/pull/26244]

> Cannot use Int.MIN_VALUE as Spark SQL fetchsize
> ---
>
> Key: SPARK-21287
> URL: https://issues.apache.org/jira/browse/SPARK-21287
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Maciej Bryński
>Assignee: Hu Fuwang
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> MySQL JDBC driver gives possibility to not store ResultSet in memory.
> We can do this by setting fetchSize to Int.MIN_VALUE.
> Unfortunately this configuration isn't correct in Spark.
> {code}
> java.lang.IllegalArgumentException: requirement failed: Invalid value 
> `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the 
> value is 0, the JDBC driver ignores the value and does the estimates.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:34)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206)
>   at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:214)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-24 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959153#comment-16959153
 ] 

Shane Knapp commented on SPARK-29106:
-

btw the VM is currently experiencing a lot of network latency and relatively 
high ping times to github.com, and the job is having trouble cloning the git 
repo.  i rebooted the VM, but it doesn't seem to be helping much.

my lead sysadmin will be out for the next week and a half, but when he returns 
we'll look in to getting a basic ARM server for our build system.  i'm pretty 
unhappy w/the VM option and think we'll have a lot more luck w/bare metal.  the 
VM will definitely help us get the ansible configs built but i'd like to get 
off of it ASAP.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 
> supporting leveldbjni jar. Now we download the leveldbjni-al-1.8 of 
> openlabtesting and 'mvn install' to use it when arm testing for spark.
> PS: The issues found and fixed:
>  SPARK-28770
>  [https://github.com/apache/spark/pull/25673]
>   
>  SPARK-28519
>  [https://github.com/apache/spark/pull/25279]
>   
>  SPARK-28433
>  [https://github.com/apache/spark/pull/25186]
>  
> SPARK-28467
> [https://github.com/apache/spark/pull/25864]
>  
> SPARK-29286
> [https://github.com/apache/spark/pull/26021]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29595) Insertion with named_struct should match by name

2019-10-24 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-29595:
---
Description: 
{code:java}
spark-sql> create table str using parquet as(select named_struct('a', 1, 'b', 
2) as data);
spark-sql>  insert into str values named_struct("b", 3, "a", 1);
spark-sql> select * from str;
{"a":3,"b":1}
{"a":1,"b":2}

{code}

The result should be 
{code:java}
{"a":1,"b":3}
{"a":1,"b":2}
{code}

Spark should match the field names of named_struct on insertion

  was:

{code:java}
spark-sql> create table str using parquet as(select named_struct('a', 1, 'b', 
2) as data);
spark-sql>  insert into str values named_struct("b", 3, "a", 1);
spark-sql> select * from str;
{"a":3,"b":1}
{"a":1,"b":2}

{code}

The result should be 
```
{"a":1,"b":3}
{"a":1,"b":2}
```
Spark should match the field names of named_struct on insertion


> Insertion with named_struct should match by name
> 
>
> Key: SPARK-29595
> URL: https://issues.apache.org/jira/browse/SPARK-29595
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Priority: Major
>
> {code:java}
> spark-sql> create table str using parquet as(select named_struct('a', 1, 'b', 
> 2) as data);
> spark-sql>  insert into str values named_struct("b", 3, "a", 1);
> spark-sql> select * from str;
> {"a":3,"b":1}
> {"a":1,"b":2}
> {code}
> The result should be 
> {code:java}
> {"a":1,"b":3}
> {"a":1,"b":2}
> {code}
> Spark should match the field names of named_struct on insertion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29595) Insertion with named_struct should match by name

2019-10-24 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-29595:
--

 Summary: Insertion with named_struct should match by name
 Key: SPARK-29595
 URL: https://issues.apache.org/jira/browse/SPARK-29595
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Gengliang Wang



{code:java}
spark-sql> create table str using parquet as(select named_struct('a', 1, 'b', 
2) as data);
spark-sql>  insert into str values named_struct("b", 3, "a", 1);
spark-sql> select * from str;
{"a":3,"b":1}
{"a":1,"b":2}

{code}

The result should be 
```
{"a":1,"b":3}
{"a":1,"b":2}
```
Spark should match the field names of named_struct on insertion



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28367) Kafka connector infinite wait because metadata never updated

2019-10-24 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959129#comment-16959129
 ] 

Gabor Somogyi commented on SPARK-28367:
---

The new Kafka API is merged to master which will be available in 2.5.

> Kafka connector infinite wait because metadata never updated
> 
>
> Key: SPARK-28367
> URL: https://issues.apache.org/jira/browse/SPARK-28367
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.3, 2.2.3, 2.3.3, 2.4.3, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>
> Spark uses an old and deprecated API named poll(long) which never returns and 
> stays in live lock if metadata is not updated (for instance when broker 
> disappears at consumer creation).
> I've created a small standalone application to test it and the alternatives: 
> https://github.com/gaborgsomogyi/kafka-get-assignment



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-24 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959124#comment-16959124
 ] 

Shane Knapp commented on SPARK-29106:
-

> First I want to share the details what we have done in Openlab test env.

this is an extremely basic python installation, and doesn't include important 
things that pyspark needs to test against, like pandas and pyarrow.

> 1) If we can not use Anaconda, how about manage the packages via ansible too? 
> Just for ARM now?  Such as for py27, we need to install what packages from 
> pip/somewhere and need to install manually(For manually installed packages, 
> if possible, we can do something like leveldbjni on maven, provider a 
> public/official way to fit the ARM package downloading/installation). For 
> now, I personally think it's very difficult to use Anaconda, as there aren't 
> so much package management platform for ARM, eventhrough we start up Anaconda 
> on ARM. If we do that, we need to fix the all gaps, that's a very huge 
> project.

a few things here:

* i am already using ansible to set up and deploy python via anaconda (and pip) 
on the x86 workers
* we can't use anaconda for ARM, period.  we have to use python virtual envs
* i still haven't had the cycles to dive in to trying to recreate the 3 python 
envs on ARM yet

> 2) For multiple python version, py27 py34 py36 and pypy, the venv is the 
> right choice now. But how about support part of them for the first step? Such 
> as only 1 or 2 python version support now, as we already passed on py27 and 
> py36 testing. Let's see that ARM eco is very limited now. 

yeah, i was planning on doing one at a time.

> 3) As the following integration work is in your sight, we can not know so 
> much details about what problem you hit. So please feel free to tell us how 
> can we help you, we are looking forward to work with you.

that's the plan!  :)

> For more quick to test SparkR, I install manually in the ARM jenkins worker, 
> because the R installation also need so much time, including deb librarise 
> install and R itself. So I found amplab jenkins job also manage the R 
> installation before the real spark test execution? Is that happened in each 
> build?

no, R is set up via ansible and not modified by the build.

> I think the current maven UT test could be run 1 time per day, and 
> pyspark/sparkR runs 1 time per day. Eventhough they are running 
> simultaneously, but we can make the 2 jobs trigger in different time period, 
> such as maven UT test(From 0:00 am to 12:00 am), pyspark/sparkR(From 1:00pm 
> to 10:00pm).

sure, sounds like a plan once we/i get those two parts set up on the worker in 
an atomic and reproducible way.

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, unless hadoop release with new arm 

[jira] [Commented] (SPARK-29585) Duration in stagePage does not match Duration in Summary Metrics for Completed Tasks

2019-10-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959111#comment-16959111
 ] 

Dongjoon Hyun commented on SPARK-29585:
---

Thank you for filing JIRA, [~UFO]. For new feature JIRA, we use the next 
version as `Affected Versions` because Apache Spark doesn't allow new feature 
backporting.

> Duration in stagePage does not match Duration in Summary Metrics for 
> Completed Tasks
> 
>
> Key: SPARK-29585
> URL: https://issues.apache.org/jira/browse/SPARK-29585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: teeyog
>Priority: Major
>
> Summary Metrics for Completed Tasks uses  executorRunTime, and Duration in 
> Task details uses executorRunTime, but Duration in Completed Stages uses
> {code:java}
> stageData.completionTime - stageData.firstTaskLaunchedTime{code}
> , which results in different results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29585) Duration in stagePage does not match Duration in Summary Metrics for Completed Tasks

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29585:
--
Component/s: (was: Spark Core)

> Duration in stagePage does not match Duration in Summary Metrics for 
> Completed Tasks
> 
>
> Key: SPARK-29585
> URL: https://issues.apache.org/jira/browse/SPARK-29585
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: teeyog
>Priority: Major
>
> Summary Metrics for Completed Tasks uses  executorRunTime, and Duration in 
> Task details uses executorRunTime, but Duration in Completed Stages uses
> {code:java}
> stageData.completionTime - stageData.firstTaskLaunchedTime{code}
> , which results in different results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29585) Duration in stagePage does not match Duration in Summary Metrics for Completed Tasks

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29585:
--
Affects Version/s: (was: 2.4.4)
   (was: 2.3.4)
   3.0.0

> Duration in stagePage does not match Duration in Summary Metrics for 
> Completed Tasks
> 
>
> Key: SPARK-29585
> URL: https://issues.apache.org/jira/browse/SPARK-29585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 3.0.0
>Reporter: teeyog
>Priority: Major
>
> Summary Metrics for Completed Tasks uses  executorRunTime, and Duration in 
> Task details uses executorRunTime, but Duration in Completed Stages uses
> {code:java}
> stageData.completionTime - stageData.firstTaskLaunchedTime{code}
> , which results in different results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29490) Reset 'WritableColumnVector' in 'RowToColumnarExec'

2019-10-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959084#comment-16959084
 ] 

Dongjoon Hyun commented on SPARK-29490:
---

Hi, [~rongma]. Thank you for reporting.
Does this only exist in 3.0.0?

> Reset 'WritableColumnVector' in 'RowToColumnarExec'
> ---
>
> Key: SPARK-29490
> URL: https://issues.apache.org/jira/browse/SPARK-29490
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Rong Ma
>Priority: Major
>
> When converting {{Iterator[InternalRow]}} to {{Iterator[ColumnarBatch]}}, the 
> vectors used to create a new {{ColumnarBatch}} should be reset in the 
> iterator's "next()" method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959078#comment-16959078
 ] 

Marcelo Masiero Vanzin commented on SPARK-29593:


bq. Is there any plans to make it more public?

Not really. The hope is that someone who really needs that will do it.

> Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29594) Create a Dataset from a Sequence of Case class

2019-10-24 Thread Pedro Correia Luis (Jira)
Pedro Correia Luis created SPARK-29594:
--

 Summary: Create a Dataset from a Sequence of Case class
 Key: SPARK-29594
 URL: https://issues.apache.org/jira/browse/SPARK-29594
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4
Reporter: Pedro Correia Luis


The Dataset code generation logic fails to handle field-names in case classes 
(e.g. "1_something"). Scala has an escaping mechanism (using backquotes) that 
allows Java (and Scala) keywords to be used as names in programs, as in the 
example below:

 

case class Foo(`1_something`: String)

 

val test = Seq(Foo("HelloWorld!")).toDS()


But this case class trips up the Dataset code generator. The following error 
message is displayed when Datasets containing instances of such case classes 
are processed.



java.lang.RuntimeException: Error while encoding: 
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
316, Column 15: failed to compile: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
316, Column 15: Expression "funcResult_2 = value_19" is not a type[0m
[31mstaticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
fromString, unwrapoption(ObjectType(class java.lang.String), 
assertnotnull(assertnotnull(input[0, Foo, true])).1_something), true, false) AS 
1_something#40[0m


 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Kevin Doyle (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959071#comment-16959071
 ] 

Kevin Doyle commented on SPARK-29593:
-

Thanks Marcelo. Let me take a look into that. Is there any plans to make it 
more public?

> Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Kevin Doyle (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Doyle updated SPARK-29593:

Description: 
Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
Kubernetes forked the code to build it and then bring it into Spark. Lots of 
work is still going on with the Kubernetes cluster manager. It should be able 
to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
This will also benefit enterprise companies that have their own cluster 
managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
 1. Make the cluster manager pluggable.
 2. Have the Spark Standalone cluster manager ship with Spark by default and be 
the base cluster manager others can inherit from. Others can be shipped or not 
shipped at same time.
 3. Each Cluster Manager can ship additional jars that can be placed inside 
Spark, then with a configuration file define the cluster manager Spark runs 
with. 
 4. The configuration file can define which classes to use for the various 
parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
different one.
 5. Based on the classes that are allowed to be switched out in the Spark code 
we can use code like the following to load a different class.

–+val+ +clazz+ = Class.forName("*https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]


> Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Marcelo Masiero Vanzin (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959063#comment-16959063
 ] 

Marcelo Masiero Vanzin commented on SPARK-29593:


This already exists: {{org.apache.spark.scheduler.ExternalClusterManager}}.

It's just not a proper public API; you can still use it by having your 
implementation be in in {{org.apache.spark}} namespace (other classes can be 
elsewhere).

> Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]
> Proposal discussed at Spark + AI Summit Europe 2019: 
> [https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Kevin Doyle (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Doyle updated SPARK-29593:

Description: 
Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
Kubernetes forked the code to build it and then bring it into Spark. Lots of 
work is still going on with the Kubernetes cluster manager. It should be able 
to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
This will also benefit enterprise companies that have their own cluster 
managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
 1. Make the cluster manager pluggable.
 2. Have the Spark Standalone cluster manager ship with Spark by default and be 
the base cluster manager others can inherit from. Others can be shipped or not 
shipped at same time.
 3. Each Cluster Manager can ship additional jars that can be placed inside 
Spark, then with a configuration file define the cluster manager Spark runs 
with. 
 4. The configuration file can define which classes to use for the various 
parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
different one.
 5. Based on the classes that are allowed to be switched out in the Spark code 
we can use code like the following to load a different class.

–+val+ +clazz+ = Class.forName("*https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]

  was:
Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
Kubernetes forked the code to build it and then bring it into Spark. Lots of 
work is still going on with the Kubernetes cluster manager. It should be able 
to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
This will also benefit enterprise companies that have their own cluster 
managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
1. Make the cluster manager pluggable.
2. Have the Spark Standalone cluster manager ship with Spark by default and be 
the base cluster manager others can inherit from. Others can be shipped or not 
shipped at same time.
3. Each Cluster Manager can ship additional jars that can be placed inside 
Spark, then with a configuration file define the cluster manager Spark runs 
with. 
4. The configuration file can define which classes to use for the various 
parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
different one.
5. Based on the classes that are allowed to be switched out in the Spark code 
we can use code like the following to load a different class.



–+val+ +clazz+ = Class.forName("* Enhance Cluster Managers to be Pluggable
> 
>
> Key: SPARK-29593
> URL: https://issues.apache.org/jira/browse/SPARK-29593
> Project: Spark
>  Issue Type: New Feature
>  Components: Scheduler
>Affects Versions: 2.4.4
>Reporter: Kevin Doyle
>Priority: Major
>
> Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
> Kubernetes forked the code to build it and then bring it into Spark. Lots of 
> work is still going on with the Kubernetes cluster manager. It should be able 
> to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
> This will also benefit enterprise companies that have their own cluster 
> managers that aren't open source, so can't be part of Spark itself.
> High level idea to be discussed for additional options:
>  1. Make the cluster manager pluggable.
>  2. Have the Spark Standalone cluster manager ship with Spark by default and 
> be the base cluster manager others can inherit from. Others can be shipped or 
> not shipped at same time.
>  3. Each Cluster Manager can ship additional jars that can be placed inside 
> Spark, then with a configuration file define the cluster manager Spark runs 
> with. 
>  4. The configuration file can define which classes to use for the various 
> parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
> different one.
>  5. Based on the classes that are allowed to be switched out in the Spark 
> code we can use code like the following to load a different class.
> –+val+ +clazz+ = Class.forName("* from configuration file*")
>  +val+ cons = clazz.getConstructor(classOf[SparkContext])
>  cons.newInstance(+sc+).asInstanceOf[TaskSchedulerImpl]
> Proposal discussed at Spark + AI Summit Europe 2019: 
> [https://databricks.com/session_eu19/refactoring-apache-spark-to-allow-additional-cluster-managers]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Created] (SPARK-29593) Enhance Cluster Managers to be Pluggable

2019-10-24 Thread Kevin Doyle (Jira)
Kevin Doyle created SPARK-29593:
---

 Summary: Enhance Cluster Managers to be Pluggable
 Key: SPARK-29593
 URL: https://issues.apache.org/jira/browse/SPARK-29593
 Project: Spark
  Issue Type: New Feature
  Components: Scheduler
Affects Versions: 2.4.4
Reporter: Kevin Doyle


Today Cluster Managers are bundled with Spark and it is hard to add new ones. 
Kubernetes forked the code to build it and then bring it into Spark. Lots of 
work is still going on with the Kubernetes cluster manager. It should be able 
to ship more often if Spark had a pluggable way to bring in Cluster Managers. 
This will also benefit enterprise companies that have their own cluster 
managers that aren't open source, so can't be part of Spark itself.

High level idea to be discussed for additional options:
1. Make the cluster manager pluggable.
2. Have the Spark Standalone cluster manager ship with Spark by default and be 
the base cluster manager others can inherit from. Others can be shipped or not 
shipped at same time.
3. Each Cluster Manager can ship additional jars that can be placed inside 
Spark, then with a configuration file define the cluster manager Spark runs 
with. 
4. The configuration file can define which classes to use for the various 
parts. Can reuse files from Spark Standalone Cluster Manager or say to use a 
different one.
5. Based on the classes that are allowed to be switched out in the Spark code 
we can use code like the following to load a different class.



–+val+ +clazz+ = Class.forName("*

[jira] [Comment Edited] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959018#comment-16959018
 ] 

Dongjoon Hyun edited comment on SPARK-29580 at 10/24/19 4:36 PM:
-

Thank you. I don't know yet, but it's difficult to track when the failure 
happens at the Suite Selector level, 
`_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_`.


was (Author: dongjoon):
I don't know yet, but it's difficult to track when the failure happens at the 
Suite Selector level, `_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_`.

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>  

[jira] [Commented] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-24 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959018#comment-16959018
 ] 

Dongjoon Hyun commented on SPARK-29580:
---

I don't know yet, but it's difficult to track when the failure happens at the 
Suite Selector level, `_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_`.

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
>   at 
> 

[jira] [Created] (SPARK-29592) ALTER TABLE (set partition location) should look up catalog/table like v2 commands

2019-10-24 Thread Terry Kim (Jira)
Terry Kim created SPARK-29592:
-

 Summary: ALTER TABLE (set partition location) should look up 
catalog/table like v2 commands
 Key: SPARK-29592
 URL: https://issues.apache.org/jira/browse/SPARK-29592
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


ALTER TABLE (set partition location) should look up catalog/table like v2 
commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29592) ALTER TABLE (set partition location) should look up catalog/table like v2 commands

2019-10-24 Thread Terry Kim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959010#comment-16959010
 ] 

Terry Kim commented on SPARK-29592:
---

working on this.

> ALTER TABLE (set partition location) should look up catalog/table like v2 
> commands
> --
>
> Key: SPARK-29592
> URL: https://issues.apache.org/jira/browse/SPARK-29592
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>
> ALTER TABLE (set partition location) should look up catalog/table like v2 
> commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29532) simplify interval string parsing

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-29532.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26190
[https://github.com/apache/spark/pull/26190]

> simplify interval string parsing
> 
>
> Key: SPARK-29532
> URL: https://issues.apache.org/jira/browse/SPARK-29532
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21287) Cannot use Int.MIN_VALUE as Spark SQL fetchsize

2019-10-24 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-21287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-21287:
--
Affects Version/s: 3.0.0
   2.2.3
   2.3.4
   2.4.4

> Cannot use Int.MIN_VALUE as Spark SQL fetchsize
> ---
>
> Key: SPARK-21287
> URL: https://issues.apache.org/jira/browse/SPARK-21287
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Maciej Bryński
>Priority: Major
>
> MySQL JDBC driver gives possibility to not store ResultSet in memory.
> We can do this by setting fetchSize to Int.MIN_VALUE.
> Unfortunately this configuration isn't correct in Spark.
> {code}
> java.lang.IllegalArgumentException: requirement failed: Invalid value 
> `-2147483648` for parameter `fetchsize`. The minimum value is 0. When the 
> value is 0, the JDBC driver ignores the value and does the estimates.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:34)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:32)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:330)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:166)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206)
>   at sun.reflect.GeneratedMethodAccessor46.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
>   at py4j.Gateway.invoke(Gateway.java:280)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:214)
>   at java.lang.Thread.run(Thread.java:748)
> {code}
> https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29580) KafkaDelegationTokenSuite fails to create new KafkaAdminClient

2019-10-24 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958963#comment-16958963
 ] 

Gabor Somogyi commented on SPARK-29580:
---

[~dongjoon] Started to look...
Is it an intermittent problem or comes constantly under some circumstances?

> KafkaDelegationTokenSuite fails to create new KafkaAdminClient
> --
>
> Key: SPARK-29580
> URL: https://issues.apache.org/jira/browse/SPARK-29580
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112562/testReport/org.apache.spark.sql.kafka010/KafkaDelegationTokenSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/
> {code}
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:407)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:55)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:227)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:249)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:56)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Server not found in Kerberos 
> database (7) - Server not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:160)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:146)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:67)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:99)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:382)
>   ... 16 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Server not found in Kerberos database (7) - Server not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
> org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:61)
>   at 
> org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:104)
>   at 
> 

[jira] [Commented] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql

2019-10-24 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958910#comment-16958910
 ] 

Ankit Raj Boudh commented on SPARK-29591:
-

this issue i will analyse.

> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgresql
> ---
>
> Key: SPARK-29591
> URL: https://issues.apache.org/jira/browse/SPARK-29591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Major
>
> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgre sql.
> *In postgre sql*
> CREATE TABLE weather (
>  city varchar(80),
>  temp_lo int, – low temperature
>  temp_hi int, – high temperature
>  prcp real, – precipitation
>  date date
>  );
> *You can list the columns in a different order if you wish or even omit some 
> columns,*
> INSERT INTO weather (date, city, temp_hi, temp_lo)
>  VALUES ('1994-11-29', 'Hayward', 54, 37);
> *Spark SQL*
> But in spark sql is not allowing to insert data in different order or omit 
> any column.Better to support this as it can save time if we can not predict 
> any specific column value or if some value is fixed always.
> create table jobit(id int,name string);
> > insert into jobit values(1,"Ankit");
>  Time taken: 0.548 seconds
>  spark-sql> *insert into jobit (id) values(1);*
>  *Error in query:*
>  mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
> 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (id) values(1)
>  ---^^^
> spark-sql> *insert into jobit (name,id) values("Ankit",1);*
>  *Error in query:*
>  mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 
> 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (name,id) values("Ankit",1)
>  ---^^^
> spark-sql>
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql

2019-10-24 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29591:
-
Description: 
Support data insertion in a different order if you wish or even omit some 
columns in spark sql also like postgre sql.

*In postgre sql*

CREATE TABLE weather (
 city varchar(80),
 temp_lo int, – low temperature
 temp_hi int, – high temperature
 prcp real, – precipitation
 date date
 );

*You can list the columns in a different order if you wish or even omit some 
columns,*

INSERT INTO weather (date, city, temp_hi, temp_lo)
 VALUES ('1994-11-29', 'Hayward', 54, 37);

*Spark SQL*

But in spark sql is not allowing to insert data in different order or omit any 
column.Better to support this as it can save time if we can not predict any 
specific column value or if some value is fixed always.

create table jobit(id int,name string);

> insert into jobit values(1,"Ankit");
 Time taken: 0.548 seconds
 spark-sql> *insert into jobit (id) values(1);*
 *Error in query:*
 mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
 insert into jobit (id) values(1)
 ---^^^

spark-sql> *insert into jobit (name,id) values("Ankit",1);*
 *Error in query:*
 mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
 insert into jobit (name,id) values("Ankit",1)
 ---^^^

spark-sql>

 

  was:
Support data insertion in a different order if you wish or even omit some 
columns in spark sql also like postgre sql.

*In postgre sql*

CREATE TABLE weather (
 city varchar(80),
 temp_lo int, – low temperature
 temp_hi int, – high temperature
 prcp real, – precipitation
 date date
 );

*You can list the columns in a different order if you wish or even omit some 
columns,*

INSERT INTO weather (date, city, temp_hi, temp_lo)
 VALUES ('1994-11-29', 'Hayward', 54, 37);

*Spark SQL*

But in spark sql is not allowing to insert data in different order or omit any 
column.Better to support this as it can save time if we can not predict any 
specific column value or if some value is fixed always.

create table jobit(id int,name string);

> insert into jobit values(1,"Ankit");
 Time taken: 0.548 seconds
 spark-sql> *insert into jobit (id) values(1);*
 *Error in query:*
 mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
 insert into jobit (id) values(1)
 ---^^^

spark-sql> insert into jobit (name,id) values("Ankit",1);
 *Error in query:*
 mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
 insert into jobit (name,id) values("Ankit",1)
 ---^^^

spark-sql>

 


> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgresql
> ---
>
> Key: SPARK-29591
> URL: https://issues.apache.org/jira/browse/SPARK-29591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Major
>
> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgre sql.
> *In postgre sql*
> CREATE TABLE weather (
>  city varchar(80),
>  temp_lo int, – low temperature
>  temp_hi int, – high temperature
>  prcp real, – precipitation
>  date date
>  );
> *You can list the columns in a different order if you wish or even omit some 
> columns,*
> INSERT INTO weather (date, city, temp_hi, temp_lo)
>  VALUES ('1994-11-29', 'Hayward', 54, 37);
> *Spark SQL*
> But in spark sql is not allowing to insert data in different order or omit 
> any column.Better to support this as it can save time if we can not predict 
> any specific column value or if some value is fixed always.
> create table jobit(id int,name string);
> > insert into jobit values(1,"Ankit");
>  Time taken: 0.548 seconds
>  spark-sql> *insert into jobit (id) values(1);*
>  *Error in query:*
>  mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
> 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (id) values(1)
>  ---^^^
> spark-sql> *insert into jobit (name,id) values("Ankit",1);*
>  *Error in query:*
>  mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 
> 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (name,id) values("Ankit",1)
>  ---^^^
> spark-sql>
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql

2019-10-24 Thread jobit mathew (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jobit mathew updated SPARK-29591:
-
Description: 
Support data insertion in a different order if you wish or even omit some 
columns in spark sql also like postgre sql.

*In postgre sql*

CREATE TABLE weather (
 city varchar(80),
 temp_lo int, – low temperature
 temp_hi int, – high temperature
 prcp real, – precipitation
 date date
 );

*You can list the columns in a different order if you wish or even omit some 
columns,*

INSERT INTO weather (date, city, temp_hi, temp_lo)
 VALUES ('1994-11-29', 'Hayward', 54, 37);

*Spark SQL*

But in spark sql is not allowing to insert data in different order or omit any 
column.Better to support this as it can save time if we can not predict any 
specific column value or if some value is fixed always.

create table jobit(id int,name string);

> insert into jobit values(1,"Ankit");
 Time taken: 0.548 seconds
 spark-sql> *insert into jobit (id) values(1);*
 *Error in query:*
 mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
 insert into jobit (id) values(1)
 ---^^^

spark-sql> insert into jobit (name,id) values("Ankit",1);
 *Error in query:*
 mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
 insert into jobit (name,id) values("Ankit",1)
 ---^^^

spark-sql>

 

  was:
Support data insertion in a different order if you wish or even omit some 
columns in spark sql also like postgre sql.

*In postgre sql*

CREATE TABLE weather (
 city varchar(80),
 temp_lo int, -- low temperature
 temp_hi int, -- high temperature
 prcp real, -- precipitation
 date date
);

*You can list the columns in a different order if you wish or even omit some 
columns,*

INSERT INTO weather (date, city, temp_hi, temp_lo)
 VALUES ('1994-11-29', 'Hayward', 54, 37);

*But in spark sql i*s not allowing to insert data in different order or omit 
any column.Better to support this as it can save time if we can not predict any 
specific column value or if some value is fixed always.

create table jobit(id int,name string);

> insert into jobit values(1,"Ankit");
Time taken: 0.548 seconds
spark-sql> *insert into jobit (id) values(1);*
*Error in query:*
mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
insert into jobit (id) values(1)
---^^^

spark-sql> insert into jobit (name,id) values("Ankit",1);
Error in query:
mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
insert into jobit (name,id) values("Ankit",1)
---^^^

spark-sql>

 


> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgresql
> ---
>
> Key: SPARK-29591
> URL: https://issues.apache.org/jira/browse/SPARK-29591
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Major
>
> Support data insertion in a different order if you wish or even omit some 
> columns in spark sql also like postgre sql.
> *In postgre sql*
> CREATE TABLE weather (
>  city varchar(80),
>  temp_lo int, – low temperature
>  temp_hi int, – high temperature
>  prcp real, – precipitation
>  date date
>  );
> *You can list the columns in a different order if you wish or even omit some 
> columns,*
> INSERT INTO weather (date, city, temp_hi, temp_lo)
>  VALUES ('1994-11-29', 'Hayward', 54, 37);
> *Spark SQL*
> But in spark sql is not allowing to insert data in different order or omit 
> any column.Better to support this as it can save time if we can not predict 
> any specific column value or if some value is fixed always.
> create table jobit(id int,name string);
> > insert into jobit values(1,"Ankit");
>  Time taken: 0.548 seconds
>  spark-sql> *insert into jobit (id) values(1);*
>  *Error in query:*
>  mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
> 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (id) values(1)
>  ---^^^
> spark-sql> insert into jobit (name,id) values("Ankit",1);
>  *Error in query:*
>  mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 
> 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)
> == SQL ==
>  insert into jobit (name,id) values("Ankit",1)
>  ---^^^
> spark-sql>
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql

2019-10-24 Thread jobit mathew (Jira)
jobit mathew created SPARK-29591:


 Summary: Support data insertion in a different order if you wish 
or even omit some columns in spark sql also like postgresql
 Key: SPARK-29591
 URL: https://issues.apache.org/jira/browse/SPARK-29591
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.4
Reporter: jobit mathew


Support data insertion in a different order if you wish or even omit some 
columns in spark sql also like postgre sql.

*In postgre sql*

CREATE TABLE weather (
 city varchar(80),
 temp_lo int, -- low temperature
 temp_hi int, -- high temperature
 prcp real, -- precipitation
 date date
);

*You can list the columns in a different order if you wish or even omit some 
columns,*

INSERT INTO weather (date, city, temp_hi, temp_lo)
 VALUES ('1994-11-29', 'Hayward', 54, 37);

*But in spark sql i*s not allowing to insert data in different order or omit 
any column.Better to support this as it can save time if we can not predict any 
specific column value or if some value is fixed always.

create table jobit(id int,name string);

> insert into jobit values(1,"Ankit");
Time taken: 0.548 seconds
spark-sql> *insert into jobit (id) values(1);*
*Error in query:*
mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
insert into jobit (id) values(1)
---^^^

spark-sql> insert into jobit (name,id) values("Ankit",1);
Error in query:
mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 
'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19)

== SQL ==
insert into jobit (name,id) values("Ankit",1)
---^^^

spark-sql>

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29590) Support hiding table in JDBC/ODBC server page in WebUI

2019-10-24 Thread shahid (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958909#comment-16958909
 ] 

shahid commented on SPARK-29590:


I will raise a PR

> Support hiding table in JDBC/ODBC server page in WebUI
> --
>
> Key: SPARK-29590
> URL: https://issues.apache.org/jira/browse/SPARK-29590
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support hiding table in JDBC/ODBC server page in WebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29589) Support pagination for sql session stats table in JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958907#comment-16958907
 ] 

shahid commented on SPARK-29589:


I will raise a PR

> Support pagination for sql session stats table in JDBC/ODBC server page
> ---
>
> Key: SPARK-29589
> URL: https://issues.apache.org/jira/browse/SPARK-29589
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support pagination for sql session stats table in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29590) Support hiding table in JDBC/ODBC server page in WebUI

2019-10-24 Thread shahid (Jira)
shahid created SPARK-29590:
--

 Summary: Support hiding table in JDBC/ODBC server page in WebUI
 Key: SPARK-29590
 URL: https://issues.apache.org/jira/browse/SPARK-29590
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Support hiding table in JDBC/ODBC server page in WebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29588) Improvements in WebUI JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-29588:
---
Summary: Improvements in WebUI JDBC/ODBC server page  (was: Improvements in 
JDBC/ODBC server page)

> Improvements in WebUI JDBC/ODBC server page
> ---
>
> Key: SPARK-29588
> URL: https://issues.apache.org/jira/browse/SPARK-29588
> Project: Spark
>  Issue Type: Umbrella
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Improvements in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29589) Support pagination for sql session stats table in JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)
shahid created SPARK-29589:
--

 Summary: Support pagination for sql session stats table in 
JDBC/ODBC server page
 Key: SPARK-29589
 URL: https://issues.apache.org/jira/browse/SPARK-29589
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Support pagination for sql session stats table in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29588) Improvements in JDBC/ODBC server page

2019-10-24 Thread shahid (Jira)
shahid created SPARK-29588:
--

 Summary: Improvements in JDBC/ODBC server page
 Key: SPARK-29588
 URL: https://issues.apache.org/jira/browse/SPARK-29588
 Project: Spark
  Issue Type: Umbrella
  Components: Web UI
Affects Versions: 2.4.4, 3.0.0
Reporter: shahid


Improvements in JDBC/ODBC server page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29578) JDK 1.8.0_232 timezone updates cause "Kwajalein" test failures again

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29578.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26236
[https://github.com/apache/spark/pull/26236]

> JDK 1.8.0_232 timezone updates cause "Kwajalein" test failures again
> 
>
> Key: SPARK-29578
> URL: https://issues.apache.org/jira/browse/SPARK-29578
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
> Fix For: 3.0.0
>
>
> I have a report that tests fail in JDK 1.8.0_232 because of timezone changes 
> in (I believe) tzdata2018i or later, per 
> https://www.oracle.com/technetwork/java/javase/tzdata-versions-138805.html:
> {{*** FAILED *** with 8634 did not equal 8633 Round trip of 8633 did not work 
> in tz}}
> See also https://issues.apache.org/jira/browse/SPARK-24950
> I say "I've heard" because I can't get this version easily on my Mac. However 
> the fix is so inconsequential that I think we can just make it, to allow this 
> additional variation just as before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29559) Support pagination for JDBC/ODBC UI page

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-29559:
-
Priority: Minor  (was: Major)

> Support pagination for JDBC/ODBC UI page
> 
>
> Key: SPARK-29559
> URL: https://issues.apache.org/jira/browse/SPARK-29559
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Minor
> Fix For: 3.0.0
>
>
> Support pagination for JDBC/ODBC UI page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29559) Support pagination for JDBC/ODBC UI page

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-29559:


Assignee: shahid

> Support pagination for JDBC/ODBC UI page
> 
>
> Key: SPARK-29559
> URL: https://issues.apache.org/jira/browse/SPARK-29559
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Major
>
> Support pagination for JDBC/ODBC UI page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29559) Support pagination for JDBC/ODBC UI page

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-29559.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26215
[https://github.com/apache/spark/pull/26215]

> Support pagination for JDBC/ODBC UI page
> 
>
> Key: SPARK-29559
> URL: https://issues.apache.org/jira/browse/SPARK-29559
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 2.4.4, 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Major
> Fix For: 3.0.0
>
>
> Support pagination for JDBC/ODBC UI page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28791) Document ALTER TABLE statement in SQL Reference.

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-28791:


Assignee: pavithra ramachandran

> Document ALTER TABLE statement in SQL Reference.
> 
>
> Key: SPARK-28791
> URL: https://issues.apache.org/jira/browse/SPARK-28791
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: pavithra ramachandran
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28791) Document ALTER TABLE statement in SQL Reference.

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-28791:
-
Priority: Minor  (was: Major)

> Document ALTER TABLE statement in SQL Reference.
> 
>
> Key: SPARK-28791
> URL: https://issues.apache.org/jira/browse/SPARK-28791
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: pavithra ramachandran
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28791) Document ALTER TABLE statement in SQL Reference.

2019-10-24 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-28791.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 25590
[https://github.com/apache/spark/pull/25590]

> Document ALTER TABLE statement in SQL Reference.
> 
>
> Key: SPARK-28791
> URL: https://issues.apache.org/jira/browse/SPARK-28791
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: pavithra ramachandran
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29145) Spark SQL cannot handle "NOT IN" condition when using "JOIN"

2019-10-24 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-29145.
--
Fix Version/s: 3.0.0
 Assignee: angerszhu
   Resolution: Fixed

Resolved by 
[https://github.com/apache/spark/pull/25854|https://github.com/apache/spark/pull/25854#]

> Spark SQL cannot handle "NOT IN" condition when using "JOIN"
> 
>
> Key: SPARK-29145
> URL: https://issues.apache.org/jira/browse/SPARK-29145
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Dezhi Cai
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.0.0
>
>
> sample sql: 
> {code}
> spark.range(10).createOrReplaceTempView("A")
> spark.range(10).createOrReplaceTempView("B")
> spark.range(10).createOrReplaceTempView("C")
> sql("""select * from A inner join B on A.id=B.id and A.id NOT IN (select id 
> from C)""")
> {code}
>  
> {code}
> org.apache.spark.sql.AnalysisException: Table or view not found: C; line 1 
> pos 74;
> 'Project [*]
> +- 'Join Inner, ((id#0L = id#2L) AND NOT id#0L IN (list#6 []))
>:  +- 'Project ['id]
>: +- 'UnresolvedRelation [C]
>:- SubqueryAlias `a`
>:  +- Range (0, 10, step=1, splits=Some(12))
>+- SubqueryAlias `b`
>   +- Range (0, 10, step=1, splits=Some(12))
>   at 
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:94)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:155)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:154)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:154)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:89)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:86)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:120)
> ...
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql

2019-10-24 Thread Ankit Raj Boudh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958840#comment-16958840
 ] 

Ankit Raj Boudh commented on SPARK-29587:
-

I will analyse this issue

> Real data type is not supported in Spark SQL which is supporting in postgresql
> --
>
> Key: SPARK-29587
> URL: https://issues.apache.org/jira/browse/SPARK-29587
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: jobit mathew
>Priority: Minor
>
> Real data type is not supported in Spark SQL which is supporting in 
> postgresql.
> +*In postgresql query success*+
> CREATE TABLE weather2(prcp real);
> insert into weather2 values(2.5);
> select * from weather2;
>  
> ||  ||prcp||
> |1|2,5|
> +*In spark sql getting error*+
> spark-sql> CREATE TABLE weather2(prcp real);
> Error in query:
> DataType real is not supported.(line 1, pos 27)
> == SQL ==
> CREATE TABLE weather2(prcp real)
> ---
> Better to add the datatype "real " support in sql also
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12823) Cannot create UDF with StructType input

2019-10-24 Thread David Szakallas (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958841#comment-16958841
 ] 

David Szakallas commented on SPARK-12823:
-

Since the helper library [~gbarna] created is very slim, why not integrate it 
into Spark itself? I think it would be pretty natural to have a typed udf 
interface.

> Cannot create UDF with StructType input
> ---
>
> Key: SPARK-12823
> URL: https://issues.apache.org/jira/browse/SPARK-12823
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2
>Reporter: Frank Rosner
>Priority: Major
>  Labels: bulk-closed
>
> h5. Problem
> It is not possible to apply a UDF to a column that has a struct data type. 
> Two previous requests to the mailing list remained unanswered.
> h5. How-To-Reproduce
> {code}
> val sql = new org.apache.spark.sql.SQLContext(sc)
> import sql.implicits._
> case class KV(key: Long, value: String)
> case class Row(kv: KV)
> val df = sc.parallelize(List(Row(KV(1L, "a")), Row(KV(5L, "b".toDF
> val udf1 = org.apache.spark.sql.functions.udf((kv: KV) => kv.value)
> df.select(udf1(df("kv"))).show
> // java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast 
> to $line78.$read$$iwC$$iwC$KV
> val udf2 = org.apache.spark.sql.functions.udf((kv: (Long, String)) => kv._2)
> df.select(udf2(df("kv"))).show
> // org.apache.spark.sql.AnalysisException: cannot resolve 'UDF(kv)' due to 
> data type mismatch: argument 1 requires struct<_1:bigint,_2:string> type, 
> however, 'kv' is of struct type.;
> {code}
> h5. Mailing List Entries
> - 
> https://mail-archives.apache.org/mod_mbox/spark-user/201511.mbox/%3CCACUahd8M=ipCbFCYDyein_=vqyoantn-tpxe6sq395nh10g...@mail.gmail.com%3E
> - https://www.mail-archive.com/user@spark.apache.org/msg43092.html
> h5. Possible Workaround
> If you create a {{UserDefinedFunction}} manually, not using the {{udf}} 
> helper functions, it works. See https://github.com/FRosner/struct-udf, which 
> exposes the {{UserDefinedFunction}} constructor (public from package 
> private). However, then you have to work with a {{Row}}, because it does not 
> automatically convert the row to a case class / tuple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29587) Real data type is not supported in Spark SQL which is supporting in postgresql

2019-10-24 Thread jobit mathew (Jira)
jobit mathew created SPARK-29587:


 Summary: Real data type is not supported in Spark SQL which is 
supporting in postgresql
 Key: SPARK-29587
 URL: https://issues.apache.org/jira/browse/SPARK-29587
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.4
Reporter: jobit mathew


Real data type is not supported in Spark SQL which is supporting in postgresql.

+*In postgresql query success*+

CREATE TABLE weather2(prcp real);

insert into weather2 values(2.5);
select * from weather2;
 
||  ||prcp||
|1|2,5|

+*In spark sql getting error*+

spark-sql> CREATE TABLE weather2(prcp real);
Error in query:
DataType real is not supported.(line 1, pos 27)

== SQL ==
CREATE TABLE weather2(prcp real)
---

Better to add the datatype "real " support in sql also

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29504) Tooltip not display for Job Description even it shows ellipsed

2019-10-24 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-29504:
--

Assignee: pavithra ramachandran

> Tooltip  not display for Job Description even it shows ellipsed
> ---
>
> Key: SPARK-29504
> URL: https://issues.apache.org/jira/browse/SPARK-29504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: pavithra ramachandran
>Priority: Major
> Attachments: ToolTip JIRA.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29504) Tooltip not display for Job Description even it shows ellipsed

2019-10-24 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-29504.

Resolution: Fixed

This issue is resolved in https://github.com/apache/spark/pull/26222

> Tooltip  not display for Job Description even it shows ellipsed
> ---
>
> Key: SPARK-29504
> URL: https://issues.apache.org/jira/browse/SPARK-29504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: pavithra ramachandran
>Priority: Major
> Attachments: ToolTip JIRA.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread Hu Fuwang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hu Fuwang updated SPARK-29586:
--
Comment: was deleted

(was: I am working on this.)

> spark jdbc method param lowerBound and upperBound DataType wrong
> 
>
> Key: SPARK-29586
> URL: https://issues.apache.org/jira/browse/SPARK-29586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: daile
>Priority: Major
>
>  
> {code:java}
> private def toBoundValueInWhereClause(
> value: Long,
> columnType: DataType,
> timeZoneId: String): String = {
>   def dateTimeToString(): String = {
> val dateTimeStr = columnType match {
>   case DateType => DateFormatter().format(value.toInt)
>   case TimestampType =>
> val timestampFormatter = TimestampFormatter.getFractionFormatter(
>   DateTimeUtils.getZoneId(timeZoneId))
> DateTimeUtils.timestampToString(timestampFormatter, value)
> }
> s"'$dateTimeStr'"
>   }
>   columnType match {
> case _: NumericType => value.toString
> case DateType | TimestampType => dateTimeToString()
>   }
> }{code}
> partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc 
> method only accept Long
>  
> {code:java}
> test("jdbc Suite2") {
>   val df = spark
> .read
> .option("partitionColumn", "B")
> .option("lowerBound", "2017-01-01 10:00:00")
> .option("upperBound", "2019-01-01 10:00:00")
> .option("numPartitions", 5)
> .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
>   df.printSchema()
>   df.show()
> }
> {code}
> it's OK
>  
> {code:java}
> test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, 
> "TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) 
> df.printSchema() df.show() }
> {code}
>  
> {code:java}
> java.lang.IllegalArgumentException: Cannot parse the bound value 
> 1571899768024 as datejava.lang.IllegalArgumentException: Cannot parse the 
> bound value 1571899768024 as date at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
>  at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) 
> at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
>  at scala.Option.getOrElse(Option.scala:189) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at 
> org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at 
> org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at 
> org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
> org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at 
> org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at 
> org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at 
> org.scalatest.Transformer.apply(Transformer.scala:22) at 
> org.scalatest.Transformer.apply(Transformer.scala:20) at 
> org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at 
> org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at 
> org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at 
> org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
>  at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at 
> org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at 
> org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43)
>  at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at 
> org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at 
> 

[jira] [Commented] (SPARK-29106) Add jenkins arm test for spark

2019-10-24 Thread zhao bo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958651#comment-16958651
 ] 

zhao bo commented on SPARK-29106:
-

Hi [~shaneknapp],

Thanks very much for sharing so much things to us.

> For pyspark conversation:

First I want to share the details what we have done in Openlab test env.

[https://github.com/theopenlab/spark/pull/32/files#diff-ff133db31a4c2f724e9edfb0f70d243dR33]

Anaconda python package management is very good for python and R ecosystem, but 
I think it is for the most popular ARCH ecosystem(X86 or other famous ARCH), 
because it just claimed "Cross platform". We hit the same issue, such as, we 
have to compile, install and test the dependeny libraries on ARM, that's why we 
want to improve the ARM ecosystem. 

1) If we can not use Anaconda, how about manage the packages via ansible too? 
Just for ARM now?  Such as for py27, we need to install what packages from 
pip/somewhere and need to install manually(For manually installed packages, if 
possible, we can do something like leveldbjni on maven, provider a 
public/official way to fit the ARM package downloading/installation). For now, 
I personally think it's very difficult to use Anaconda, as there aren't so much 
package management platform for ARM, eventhrough we start up Anaconda on ARM. 
If we do that, we need to fix the all gaps, that's a very huge project.

2) For multiple python version, py27 py34 py36 and pypy, the venv is the right 
choice now. But how about support part of them for the first step? Such as only 
1 or 2 python version support now, as we already passed on py27 and py36 
testing. Let's see that ARM eco is very limited now. ;)

3) As the following integration work is in your sight, we can not know so much 
details about what problem you hit. So please feel free to tell us how can we 
help you, we are looking forward to work with you. ;)

 

> For sparkR conversation:

I also share the details what we did in Openlab test env.

[https://github.com/theopenlab/spark/pull/28/files#diff-ff133db31a4c2f724e9edfb0f70d243dR4]

For more quick to test SparkR, I install manually in the ARM jenkins worker, 
because the R installation also need so much time, including deb librarise 
install and R itself. So I found amplab jenkins job also manage the R 
installation before the real spark test execution? Is that happened in each 
build?

 

> For more jenkins jobs conversation:

I think the current maven UT test could be run 1 time per day, and 
pyspark/sparkR runs 1 time per day. Eventhough they are running simultaneously, 
but we can make the 2 jobs trigger in different time period, such as maven UT 
test(From 0:00 am to 12:00 am), pyspark/sparkR(From 1:00pm to 10:00pm).

> Add jenkins arm test for spark
> --
>
> Key: SPARK-29106
> URL: https://issues.apache.org/jira/browse/SPARK-29106
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Priority: Minor
> Attachments: R-ansible.yml, R-libs.txt
>
>
> Add arm test jobs to amplab jenkins for spark.
> Till now we made two arm test periodic jobs for spark in OpenLab, one is 
> based on master with hadoop 2.7(similar with QA test of amplab jenkins), 
> other one is based on a new branch which we made on date 09-09, see  
> [http://status.openlabtesting.org/builds/job/spark-master-unit-test-hadoop-2.7-arm64]
>   and 
> [http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64.|http://status.openlabtesting.org/builds/job/spark-unchanged-branch-unit-test-hadoop-2.7-arm64]
>  We only have to care about the first one when integrate arm test with amplab 
> jenkins.
> About the k8s test on arm, we have took test it, see 
> [https://github.com/theopenlab/spark/pull/17], maybe we can integrate it 
> later. 
> And we plan test on other stable branches too, and we can integrate them to 
> amplab when they are ready.
> We have offered an arm instance and sent the infos to shane knapp, thanks 
> shane to add the first arm job to amplab jenkins :) 
> The other important thing is about the leveldbjni 
> [https://github.com/fusesource/leveldbjni,|https://github.com/fusesource/leveldbjni/issues/80]
>  spark depends on leveldbjni-all-1.8 
> [https://mvnrepository.com/artifact/org.fusesource.leveldbjni/leveldbjni-all/1.8],
>  we can see there is no arm64 supporting. So we build an arm64 supporting 
> release of leveldbjni see 
> [https://mvnrepository.com/artifact/org.openlabtesting.leveldbjni/leveldbjni-all/1.8],
>  but we can't modified the spark pom.xml directly with something like 
> 'property'/'profile' to choose correct jar package on arm or x86 platform, 
> because spark depends on some hadoop packages like hadoop-hdfs, the packages 
> depend on leveldbjni-all-1.8 too, 

[jira] [Resolved] (SPARK-29522) CACHE TABLE should look up catalog/table like v2 commands

2019-10-24 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-29522.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26179
[https://github.com/apache/spark/pull/26179]

> CACHE TABLE should look up catalog/table like v2 commands
> -
>
> Key: SPARK-29522
> URL: https://issues.apache.org/jira/browse/SPARK-29522
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> CACHE TABLE should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread daile (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daile updated SPARK-29586:
--
Description: 
 
{code:java}
private def toBoundValueInWhereClause(
value: Long,
columnType: DataType,
timeZoneId: String): String = {
  def dateTimeToString(): String = {
val dateTimeStr = columnType match {
  case DateType => DateFormatter().format(value.toInt)
  case TimestampType =>
val timestampFormatter = TimestampFormatter.getFractionFormatter(
  DateTimeUtils.getZoneId(timeZoneId))
DateTimeUtils.timestampToString(timestampFormatter, value)
}
s"'$dateTimeStr'"
  }
  columnType match {
case _: NumericType => value.toString
case DateType | TimestampType => dateTimeToString()
  }
}{code}
partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc 
method only accept Long

 
{code:java}
test("jdbc Suite2") {
  val df = spark
.read
.option("partitionColumn", "B")
.option("lowerBound", "2017-01-01 10:00:00")
.option("upperBound", "2019-01-01 10:00:00")
.option("numPartitions", 5)
.jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
  df.printSchema()
  df.show()
}
{code}
it's OK

 
{code:java}
test("jdbc Suite") { val df = spark.read.jdbc(urlWithUserAndPass, 
"TEST.TIMETYPES", "B", 1571899768024L, 1571899768024L, 5, new Properties()) 
df.printSchema() df.show() }
{code}
 
{code:java}
java.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 
as datejava.lang.IllegalArgumentException: Cannot parse the bound value 
1571899768024 as date at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
 at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
 at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
 at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
 at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240) at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229) 
at scala.Option.getOrElse(Option.scala:189) at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229) at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179) at 
org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255) at 
org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297) at 
org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664) at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at 
org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) at 
org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) at 
org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at 
org.scalatest.Transformer.apply(Transformer.scala:22) at 
org.scalatest.Transformer.apply(Transformer.scala:20) at 
org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186) at 
org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149) at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184) at 
org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196) at 
org.scalatest.SuperEngine.runTestImpl(Engine.scala:289) at 
org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196) at 
org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178) at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
 at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221) at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214) at 
org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43)
 at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203) at 
org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192) at 
org.apache.spark.sql.jdbc.JDBCSuite.runTest(JDBCSuite.scala:43) at 
org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229) at 
org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396) at 
scala.collection.immutable.List.foreach(List.scala:392) at 
org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384) at 
org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379) at 
org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461) at 
org.scalatest.FunSuiteLike.runTests(FunSuiteLike.scala:229) at 
org.scalatest.FunSuiteLike.runTests$(FunSuiteLike.scala:228) at 
org.scalatest.FunSuite.runTests(FunSuite.scala:1560) at 

[jira] [Commented] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread Hu Fuwang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16958610#comment-16958610
 ] 

Hu Fuwang commented on SPARK-29586:
---

I am working on this.

> spark jdbc method param lowerBound and upperBound DataType wrong
> 
>
> Key: SPARK-29586
> URL: https://issues.apache.org/jira/browse/SPARK-29586
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: daile
>Priority: Major
>
>  
> {code:java}
> private def toBoundValueInWhereClause(
> value: Long,
> columnType: DataType,
> timeZoneId: String): String = {
>   def dateTimeToString(): String = {
> val dateTimeStr = columnType match {
>   case DateType => DateFormatter().format(value.toInt)
>   case TimestampType =>
> val timestampFormatter = TimestampFormatter.getFractionFormatter(
>   DateTimeUtils.getZoneId(timeZoneId))
> DateTimeUtils.timestampToString(timestampFormatter, value)
> }
> s"'$dateTimeStr'"
>   }
>   columnType match {
> case _: NumericType => value.toString
> case DateType | TimestampType => dateTimeToString()
>   }
> }{code}
> partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc 
> method only accept Long
> test("jdbc Suite2") {
>   val df = spark
>     .read
>     .option("partitionColumn", "B")
>     .option("lowerBound", "2017-01-01 10:00:00")
>     .option("upperBound", "2019-01-01 10:00:00")
>     .option("numPartitions", 5)
>     .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
>   df.printSchema()
>   df.show()
> }
> test("jdbc Suite2") {
>   val df = spark
>     .read
>     .option("partitionColumn", "B")
>     .option("lowerBound", "2017-01-01 10:00:00")
>     .option("upperBound", "2019-01-01 10:00:00")
>     .option("numPartitions", 5)
>     .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
>   df.printSchema()
>   df.show()
> }
> test("jdbc Suite") {
>   val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", 
> 1571899768024L, 1571899768024L, 5, new Properties())
>   df.printSchema()
>   df.show()
> }
> java.lang.IllegalArgumentException: Cannot parse the bound value 
> 1571899768024 as date
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
>   at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240)
>   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
>   at scala.Option.getOrElse(Option.scala:189)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229)
>   at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255)
>   at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297)
>   at 
> org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
>   at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
>   at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
>   at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
>   at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
>   at 
> org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
>   at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
>   at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
>   at 
> 

[jira] [Updated] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread daile (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

daile updated SPARK-29586:
--
Description: 
 
{code:java}
private def toBoundValueInWhereClause(
value: Long,
columnType: DataType,
timeZoneId: String): String = {
  def dateTimeToString(): String = {
val dateTimeStr = columnType match {
  case DateType => DateFormatter().format(value.toInt)
  case TimestampType =>
val timestampFormatter = TimestampFormatter.getFractionFormatter(
  DateTimeUtils.getZoneId(timeZoneId))
DateTimeUtils.timestampToString(timestampFormatter, value)
}
s"'$dateTimeStr'"
  }
  columnType match {
case _: NumericType => value.toString
case DateType | TimestampType => dateTimeToString()
  }
}{code}
partitionColumn supoort NumericType, TimestampType, TimestampType but jdbc 
method only accept Long
test("jdbc Suite2") {
  val df = spark
    .read
    .option("partitionColumn", "B")
    .option("lowerBound", "2017-01-01 10:00:00")
    .option("upperBound", "2019-01-01 10:00:00")
    .option("numPartitions", 5)
    .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
  df.printSchema()
  df.show()
}
test("jdbc Suite2") {
  val df = spark
    .read
    .option("partitionColumn", "B")
    .option("lowerBound", "2017-01-01 10:00:00")
    .option("upperBound", "2019-01-01 10:00:00")
    .option("numPartitions", 5)
    .jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
  df.printSchema()
  df.show()
}
test("jdbc Suite") {
  val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", 
1571899768024L, 1571899768024L, 5, new Properties())
  df.printSchema()
  df.show()
}
java.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 
as date
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
  at scala.Option.getOrElse(Option.scala:189)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
  at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
  at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240)
  at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179)
  at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255)
  at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297)
  at org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
  at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
  at org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
  at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
  at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
  at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
  at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
  at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
  at org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
  at org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
  at 
org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43)
  at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203)
  at org.scalatest.BeforeAndAfter.runTest$(BeforeAndAfter.scala:192)
  at org.apache.spark.sql.jdbc.JDBCSuite.runTest(JDBCSuite.scala:43)
  at org.scalatest.FunSuiteLike.$anonfun$runTests$1(FunSuiteLike.scala:229)
  at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:396)
  at scala.collection.immutable.List.foreach(List.scala:392)
  at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:384)
  at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:379)
  at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:461)
  at 

[jira] [Created] (SPARK-29586) spark jdbc method param lowerBound and upperBound DataType wrong

2019-10-24 Thread daile (Jira)
daile created SPARK-29586:
-

 Summary: spark jdbc method param lowerBound and upperBound 
DataType wrong
 Key: SPARK-29586
 URL: https://issues.apache.org/jira/browse/SPARK-29586
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4, 3.0.0
Reporter: daile




```
private def toBoundValueInWhereClause(
value: Long,
columnType: DataType,
timeZoneId: String): String = {
  def dateTimeToString(): String = {
val dateTimeStr = columnType match {
  case DateType => DateFormatter().format(value.toInt)
  case TimestampType =>
val timestampFormatter = TimestampFormatter.getFractionFormatter(
  DateTimeUtils.getZoneId(timeZoneId))
DateTimeUtils.timestampToString(timestampFormatter, value)
}
s"'$dateTimeStr'"
  }
  columnType match {
case _: NumericType => value.toString
case DateType | TimestampType => dateTimeToString()
  }
}
```

partitionColumn  supoort NumericType, TimestampType, TimestampType but jdbc 
method only accept Long

```
test("jdbc Suite2") {
  val df = spark
.read
.option("partitionColumn", "B")
.option("lowerBound", "2017-01-01 10:00:00")
.option("upperBound", "2019-01-01 10:00:00")
.option("numPartitions", 5)
.jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
  df.printSchema()
  df.show()
}
```

it's OK 

```
test("jdbc Suite2") {
  val df = spark
.read
.option("partitionColumn", "B")
.option("lowerBound", "2017-01-01 10:00:00")
.option("upperBound", "2019-01-01 10:00:00")
.option("numPartitions", 5)
.jdbc(urlWithUserAndPass, "TEST.TIMETYPES",  new Properties())
  df.printSchema()
  df.show()
}
```




```
test("jdbc Suite") {
  val df = spark.read.jdbc(urlWithUserAndPass, "TEST.TIMETYPES", "B", 
1571899768024L, 1571899768024L, 5, new Properties())
  df.printSchema()
  df.show()
}
```

```Cannot parse the bound value 1571899768024 as date
java.lang.IllegalArgumentException: Cannot parse the bound value 1571899768024 
as date
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$toInternalBoundValue$1(JDBCRelation.scala:184)
at scala.Option.getOrElse(Option.scala:189)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.parse$1(JDBCRelation.scala:183)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.toInternalBoundValue(JDBCRelation.scala:189)
at 
org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.columnPartition(JDBCRelation.scala:88)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:36)
at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:339)
at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:240)
at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:229)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:229)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:179)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:255)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:297)
at 
org.apache.spark.sql.jdbc.JDBCSuite.$anonfun$new$186(JDBCSuite.scala:1664)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
at 
org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
at org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
at org.scalatest.FunSuiteLike.runTest(FunSuiteLike.scala:196)
at org.scalatest.FunSuiteLike.runTest$(FunSuiteLike.scala:178)
at 
org.apache.spark.SparkFunSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(SparkFunSuite.scala:56)
at 
org.scalatest.BeforeAndAfterEach.runTest(BeforeAndAfterEach.scala:221)
at 
org.scalatest.BeforeAndAfterEach.runTest$(BeforeAndAfterEach.scala:214)
at 
org.apache.spark.sql.jdbc.JDBCSuite.org$scalatest$BeforeAndAfter$$super$runTest(JDBCSuite.scala:43)
at org.scalatest.BeforeAndAfter.runTest(BeforeAndAfter.scala:203)
at 

[jira] [Assigned] (SPARK-29571) Fix UT in AllExecutionsPageSuite class

2019-10-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29571:


Assignee: Ankit Raj Boudh

> Fix UT in  AllExecutionsPageSuite class
> ---
>
> Key: SPARK-29571
> URL: https://issues.apache.org/jira/browse/SPARK-29571
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Ankit Raj Boudh
>Assignee: Ankit Raj Boudh
>Priority: Minor
>
> sorting should be successful UT in class AllExecutionsPageSuite failing due 
> to invalid assert condition. Needs to handle this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29571) Fix UT in AllExecutionsPageSuite class

2019-10-24 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29571.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26234
[https://github.com/apache/spark/pull/26234]

> Fix UT in  AllExecutionsPageSuite class
> ---
>
> Key: SPARK-29571
> URL: https://issues.apache.org/jira/browse/SPARK-29571
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Ankit Raj Boudh
>Assignee: Ankit Raj Boudh
>Priority: Minor
> Fix For: 3.0.0
>
>
> sorting should be successful UT in class AllExecutionsPageSuite failing due 
> to invalid assert condition. Needs to handle this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29585) Duration in stagePage does not match Duration in Summary Metrics for Completed Tasks

2019-10-24 Thread teeyog (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

teeyog updated SPARK-29585:
---
Description: 
Summary Metrics for Completed Tasks uses  executorRunTime, and Duration in Task 
details uses executorRunTime, but Duration in Completed Stages uses
{code:java}
stageData.completionTime - stageData.firstTaskLaunchedTime{code}
, which results in different results.

  was:Summary Metrics for Completed Tasks uses  executorRunTime, and Duration 
in Task details uses executorRunTime, but Duration in Completed Stages uses 
stageData.completionTime - stageData.firstTaskLaunchedTime, which results in 
different results.


> Duration in stagePage does not match Duration in Summary Metrics for 
> Completed Tasks
> 
>
> Key: SPARK-29585
> URL: https://issues.apache.org/jira/browse/SPARK-29585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.4, 2.4.4
>Reporter: teeyog
>Priority: Major
>
> Summary Metrics for Completed Tasks uses  executorRunTime, and Duration in 
> Task details uses executorRunTime, but Duration in Completed Stages uses
> {code:java}
> stageData.completionTime - stageData.firstTaskLaunchedTime{code}
> , which results in different results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29585) Duration in stagePage does not match Duration in Summary Metrics for Completed Tasks

2019-10-24 Thread teeyog (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

teeyog updated SPARK-29585:
---
Description: Summary Metrics for Completed Tasks uses  executorRunTime, and 
Duration in Task details uses executorRunTime, but Duration in Completed Stages 
uses stageData.completionTime - stageData.firstTaskLaunchedTime, which results 
in different results.  (was: Summary Metrics for Completed Tasks uses 
```executorRunTime```, and Duration in Task details uses ```executorRunTime```, 
but Duration in Completed Stages uses ```stageData.completionTime - 
stageData.firstTaskLaunchedTime```, which results in different results.)

> Duration in stagePage does not match Duration in Summary Metrics for 
> Completed Tasks
> 
>
> Key: SPARK-29585
> URL: https://issues.apache.org/jira/browse/SPARK-29585
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.4, 2.4.4
>Reporter: teeyog
>Priority: Major
>
> Summary Metrics for Completed Tasks uses  executorRunTime, and Duration in 
> Task details uses executorRunTime, but Duration in Completed Stages uses 
> stageData.completionTime - stageData.firstTaskLaunchedTime, which results in 
> different results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29585) Duration in stagePage does not match Duration in Summary Metrics for Completed Tasks

2019-10-24 Thread teeyog (Jira)
teeyog created SPARK-29585:
--

 Summary: Duration in stagePage does not match Duration in Summary 
Metrics for Completed Tasks
 Key: SPARK-29585
 URL: https://issues.apache.org/jira/browse/SPARK-29585
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Affects Versions: 2.4.4, 2.3.4
Reporter: teeyog


Summary Metrics for Completed Tasks uses ```executorRunTime```, and Duration in 
Task details uses ```executorRunTime```, but Duration in Completed Stages uses 
```stageData.completionTime - stageData.firstTaskLaunchedTime```, which results 
in different results.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org