[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2020-01-10 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012562#comment-17012562
 ] 

Jungtaek Lim commented on SPARK-28594:
--

I'm enumerating the items which are "good to do", which might be better to file 
JIRA issues once we decide we should do them, or all required functionalities 
are done and we have a resource to deal with them.

For now, the items what I have are below:
 * Retain specific number of jobs / executions which allows compact file to 
have some of finished jobs / executions
 ** [https://github.com/apache/spark/pull/27085#discussion_r363428336]
 * Separate compaction from cleaning to allow leaving some old event log files 
after compaction
 ** [https://github.com/apache/spark/pull/27085#issuecomment-572792067]
 * Cache the state of compactor to avoid replaying event log files previously 
loaded before
 ** [https://github.com/apache/spark/pull/26416#discussion_r358260674]

 

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination

2020-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012575#comment-17012575
 ] 

Dongjoon Hyun commented on SPARK-29988:
---

Oops. [~shaneknapp]. I forgot that we need the following two.
- `spark-master-test-maven-hadoop-2.7-hive-2.3`
- `spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11`

I guess we don't need to add SBT build 
(`spark-master-test-sbt-hadoop-2.7-hive-1.2`).

cc [~smilegator], [~yumwang], [~srowen].

> Adjust Jenkins jobs for `hive-1.2/2.3` combination
> --
>
> Key: SPARK-29988
> URL: https://issues.apache.org/jira/browse/SPARK-29988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2020-01-09 at 1.59.25 PM.png
>
>
> We need to rename the following Jenkins jobs first.
> spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
> spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
> spark-master-test-maven-hadoop-2.7 -> 
> spark-master-test-maven-hadoop-2.7-hive-1.2
> spark-master-test-maven-hadoop-3.2 -> 
> spark-master-test-maven-hadoop-3.2-hive-2.3
> Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
> {code}
> -Phive \
> +-Phive-1.2 \
> {code}
> And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
> {code}
> -Phive \
> +-Phive-2.3 \
> {code}
> Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
> manually. (This should be added to SCM of AmpLab Jenkins.)
> After SPARK-29981, we need to create two new jobs.
> - spark-master-test-sbt-hadoop-2.7-hive-2.3
> - spark-master-test-maven-hadoop-2.7-hive-2.3
> This is for preparation for Apache Spark 3.0.0.
> We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination

2020-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012575#comment-17012575
 ] 

Dongjoon Hyun edited comment on SPARK-29988 at 1/10/20 8:43 AM:


Oops. [~shaneknapp]. I forgot that we need the following two.
- `spark-master-test-maven-hadoop-2.7-hive-2.3`
- `spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11`

As I described in the JIRA description, SPARK-29981 is resolved. So, we need 
the above.
Since we have too many jobs already, I guess we don't need to add SBT build 
(`spark-master-test-sbt-hadoop-2.7-hive-1.2`) instead.

cc [~smilegator], [~yumwang], [~srowen].


was (Author: dongjoon):
Oops. [~shaneknapp]. I forgot that we need the following two.
- `spark-master-test-maven-hadoop-2.7-hive-2.3`
- `spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11`

I guess we don't need to add SBT build 
(`spark-master-test-sbt-hadoop-2.7-hive-1.2`).

cc [~smilegator], [~yumwang], [~srowen].

> Adjust Jenkins jobs for `hive-1.2/2.3` combination
> --
>
> Key: SPARK-29988
> URL: https://issues.apache.org/jira/browse/SPARK-29988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2020-01-09 at 1.59.25 PM.png
>
>
> We need to rename the following Jenkins jobs first.
> spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
> spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
> spark-master-test-maven-hadoop-2.7 -> 
> spark-master-test-maven-hadoop-2.7-hive-1.2
> spark-master-test-maven-hadoop-3.2 -> 
> spark-master-test-maven-hadoop-3.2-hive-2.3
> Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
> {code}
> -Phive \
> +-Phive-1.2 \
> {code}
> And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
> {code}
> -Phive \
> +-Phive-2.3 \
> {code}
> Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
> manually. (This should be added to SCM of AmpLab Jenkins.)
> After SPARK-29981, we need to create two new jobs.
> - spark-master-test-sbt-hadoop-2.7-hive-2.3
> - spark-master-test-maven-hadoop-2.7-hive-2.3
> This is for preparation for Apache Spark 3.0.0.
> We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination

2020-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012575#comment-17012575
 ] 

Dongjoon Hyun edited comment on SPARK-29988 at 1/10/20 8:44 AM:


Oops. [~shaneknapp]. I forgot that we need the following two.
- `spark-master-test-maven-hadoop-2.7-hive-2.3`
- `spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11`

Since we have too many jobs already, I guess we don't need to add SBT build 
(`spark-master-test-sbt-hadoop-2.7-hive-1.2`) instead.

cc [~smilegator], [~yumwang], [~srowen].


was (Author: dongjoon):
Oops. [~shaneknapp]. I forgot that we need the following two.
- `spark-master-test-maven-hadoop-2.7-hive-2.3`
- `spark-master-test-maven-hadoop-2.7-hive-2.3-jdk-11`

As I described in the JIRA description, SPARK-29981 is resolved. So, we need 
the above.
Since we have too many jobs already, I guess we don't need to add SBT build 
(`spark-master-test-sbt-hadoop-2.7-hive-1.2`) instead.

cc [~smilegator], [~yumwang], [~srowen].

> Adjust Jenkins jobs for `hive-1.2/2.3` combination
> --
>
> Key: SPARK-29988
> URL: https://issues.apache.org/jira/browse/SPARK-29988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2020-01-09 at 1.59.25 PM.png
>
>
> We need to rename the following Jenkins jobs first.
> spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
> spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
> spark-master-test-maven-hadoop-2.7 -> 
> spark-master-test-maven-hadoop-2.7-hive-1.2
> spark-master-test-maven-hadoop-3.2 -> 
> spark-master-test-maven-hadoop-3.2-hive-2.3
> Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
> {code}
> -Phive \
> +-Phive-1.2 \
> {code}
> And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
> {code}
> -Phive \
> +-Phive-2.3 \
> {code}
> Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
> manually. (This should be added to SCM of AmpLab Jenkins.)
> After SPARK-29981, we need to create two new jobs.
> - spark-master-test-sbt-hadoop-2.7-hive-2.3
> - spark-master-test-maven-hadoop-2.7-hive-2.3
> This is for preparation for Apache Spark 3.0.0.
> We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30482) Add sub-class of AppenderSkeleton reusable in tests

2020-01-10 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30482:
--

 Summary: Add sub-class of AppenderSkeleton reusable in tests
 Key: SPARK-30482
 URL: https://issues.apache.org/jira/browse/SPARK-30482
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Some tests define similar sub-class of AppenderSkeleton. The code duplication 
can be eliminated by defining common class in 
[SparkFunSuite.scala|https://github.com/apache/spark/compare/master...MaxGekk:dedup-appender-skeleton?expand=1#diff-d521001af1af1a2aace870feb25ae0b0]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30482) Add sub-class of AppenderSkeleton reusable in tests

2020-01-10 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30482:
---
Component/s: (was: SQL)

> Add sub-class of AppenderSkeleton reusable in tests
> ---
>
> Key: SPARK-30482
> URL: https://issues.apache.org/jira/browse/SPARK-30482
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Minor
>
> Some tests define similar sub-class of AppenderSkeleton. The code duplication 
> can be eliminated by defining common class in 
> [SparkFunSuite.scala|https://github.com/apache/spark/compare/master...MaxGekk:dedup-appender-skeleton?expand=1#diff-d521001af1af1a2aace870feb25ae0b0]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30018) Support ALTER DATABASE SET OWNER syntax

2020-01-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30018.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26775
[https://github.com/apache/spark/pull/26775]

> Support ALTER DATABASE SET OWNER syntax
> ---
>
> Key: SPARK-30018
> URL: https://issues.apache.org/jira/browse/SPARK-30018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.0.0
>
>
> {code:sql}
> ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   
> -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30018) Support ALTER DATABASE SET OWNER syntax

2020-01-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30018:
---

Assignee: Kent Yao

> Support ALTER DATABASE SET OWNER syntax
> ---
>
> Key: SPARK-30018
> URL: https://issues.apache.org/jira/browse/SPARK-30018
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>
> {code:sql}
> ALTER (DATABASE|SCHEMA) database_name SET OWNER [USER|ROLE] user_or_role;   
> -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27148) Support CURRENT_TIME and LOCALTIME when ANSI mode enabled

2020-01-10 Thread pavithra ramachandran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012620#comment-17012620
 ] 

pavithra ramachandran commented on SPARK-27148:
---

[~maropu] I would like to work on this..

> Support CURRENT_TIME and LOCALTIME when ANSI mode enabled
> -
>
> Key: SPARK-27148
> URL: https://issues.apache.org/jira/browse/SPARK-27148
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> CURRENT_TIME and LOCALTIME should be supported in the ANSI standard;
> {code:java}
> postgres=# select CURRENT_TIME;
>        timetz       
> 
> 16:45:43.398109+09
> (1 row)
> postgres=# select LOCALTIME;
>       time      
> 
> 16:45:48.60969
> (1 row){code}
> Before this, we need to support TIME types (java.sql.Time).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30483) Job History does not show pool properties table

2020-01-10 Thread ABHISHEK KUMAR GUPTA (Jira)
ABHISHEK KUMAR GUPTA created SPARK-30483:


 Summary: Job History does not show pool properties table
 Key: SPARK-30483
 URL: https://issues.apache.org/jira/browse/SPARK-30483
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0
Reporter: ABHISHEK KUMAR GUPTA


Stage will show the Pool Name column but when user clicks the hyper link  it will not redirect to Pool Properties Table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30483) Job History does not show pool properties table

2020-01-10 Thread pavithra ramachandran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012654#comment-17012654
 ] 

pavithra ramachandran commented on SPARK-30483:
---

i shall work on this

> Job History does not show pool properties table
> ---
>
> Key: SPARK-30483
> URL: https://issues.apache.org/jira/browse/SPARK-30483
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> Stage will show the Pool Name column but when user clicks the hyper link Name>  it will not redirect to Pool Properties Table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30484) Job History Storage Tab does not display RDD Table

2020-01-10 Thread ABHISHEK KUMAR GUPTA (Jira)
ABHISHEK KUMAR GUPTA created SPARK-30484:


 Summary: Job History Storage Tab does not display RDD Table
 Key: SPARK-30484
 URL: https://issues.apache.org/jira/browse/SPARK-30484
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.0.0
Reporter: ABHISHEK KUMAR GUPTA


scala> import org.apache.spark.storage.StorageLevel._
import org.apache.spark.storage.StorageLevel._

scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd")
rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at 
:27

scala> rdd.persist(MEMORY_ONLY_SER)
res0: rdd.type = rdd MapPartitionsRDD[1] at range at :27

scala> rdd.count
res1: Long = 100

scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", "name")
df: org.apache.spark.sql.DataFrame = [count: int, name: string]

scala> df.persist(DISK_ONLY)
res2: df.type = [count: int, name: string]

scala> df.count
res3: Long = 3

Open Storage Tab under Incomplete Jobs in Job History Page
UI will not display the RDD Table.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30484) Job History Storage Tab does not display RDD Table

2020-01-10 Thread pavithra ramachandran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012656#comment-17012656
 ] 

pavithra ramachandran commented on SPARK-30484:
---

i shall work on this

> Job History Storage Tab does not display RDD Table
> --
>
> Key: SPARK-30484
> URL: https://issues.apache.org/jira/browse/SPARK-30484
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> scala> import org.apache.spark.storage.StorageLevel._
> import org.apache.spark.storage.StorageLevel._
> scala> val rdd = sc.range(0, 100, 1, 5).setName("rdd")
> rdd: org.apache.spark.rdd.RDD[Long] = rdd MapPartitionsRDD[1] at range at 
> :27
> scala> rdd.persist(MEMORY_ONLY_SER)
> res0: rdd.type = rdd MapPartitionsRDD[1] at range at :27
> scala> rdd.count
> res1: Long = 100  
>   
> scala> val df = Seq((1, "andy"), (2, "bob"), (2, "andy")).toDF("count", 
> "name")
> df: org.apache.spark.sql.DataFrame = [count: int, name: string]
> scala> df.persist(DISK_ONLY)
> res2: df.type = [count: int, name: string]
> scala> df.count
> res3: Long = 3
> Open Storage Tab under Incomplete Jobs in Job History Page
> UI will not display the RDD Table.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30485) Remove SQL configs deprecated before v2.4

2020-01-10 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30485:
--

 Summary: Remove SQL configs deprecated before v2.4
 Key: SPARK-30485
 URL: https://issues.apache.org/jira/browse/SPARK-30485
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Remove the following SQL configs:
* spark.sql.variable.substitute.depth
* spark.sql.execution.pandas.respectSessionTimeZone
* spark.sql.parquet.int64AsTimestampMillis
* Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName which 
was deprecated in v2.4

Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
map:
https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30485) Remove SQL configs deprecated before v2.4

2020-01-10 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012678#comment-17012678
 ] 

Maxim Gekk commented on SPARK-30485:


[~dongjoon] [~srowen] [~cloud_fan] [~hyukjin.kwon] WDYT of the removing?

> Remove SQL configs deprecated before v2.4
> -
>
> Key: SPARK-30485
> URL: https://issues.apache.org/jira/browse/SPARK-30485
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Remove the following SQL configs:
> * spark.sql.variable.substitute.depth
> * spark.sql.execution.pandas.respectSessionTimeZone
> * spark.sql.parquet.int64AsTimestampMillis
> * Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName 
> which was deprecated in v2.4
> Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
> map:
> https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012717#comment-17012717
 ] 

Gabor Somogyi commented on SPARK-30460:
---

[~Sachin] Do I understand it correctly that you're using S3 as checkpoint 
location? If so then all I can say it's not working because S3 read-after-write 
consistency model.
In Spark 3.0 there is a new output committer where the expectation is that it 
will work but not yet deeply tested...

> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:

[jira] [Commented] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Sachin Pasalkar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012728#comment-17012728
 ] 

Sachin Pasalkar commented on SPARK-30460:
-

[~gsomogyi] Yes I am using S3 for checkpoint and as we know S3 do not support 
appending object. However, if you look at the exception stack-trace, it seems 
it is trying to append the object, which causing failure. If you follow the 
stack trace `FileBasedWriteAheadLogWriter` gets `outputstream` using HDFSUtils. 
However HDFSUtils, only supports case for HDFS not for the other non 
append-able system.

I don't see it as issue of consistency model but bug in code

> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:118

[jira] [Commented] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012756#comment-17012756
 ] 

Gabor Somogyi commented on SPARK-30460:
---

[~Sachin] even if somebody hunt down this specific issue S3 checkpoint makes 
streaming jobs dead many other different ways.


> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.

[jira] [Comment Edited] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012756#comment-17012756
 ] 

Gabor Somogyi edited comment on SPARK-30460 at 1/10/20 11:40 AM:
-

[~Sachin] even if somebody hunt down this specific issue S3 checkpoint makes 
streaming jobs dead many other ways.



was (Author: gsomogyi):
[~Sachin] even if somebody hunt down this specific issue S3 checkpoint makes 
streaming jobs dead many other different ways.


> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOut

[jira] [Commented] (SPARK-27148) Support CURRENT_TIME and LOCALTIME when ANSI mode enabled

2020-01-10 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012762#comment-17012762
 ] 

Takeshi Yamamuro commented on SPARK-27148:
--

Yea, that's ok.

> Support CURRENT_TIME and LOCALTIME when ANSI mode enabled
> -
>
> Key: SPARK-27148
> URL: https://issues.apache.org/jira/browse/SPARK-27148
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> CURRENT_TIME and LOCALTIME should be supported in the ANSI standard;
> {code:java}
> postgres=# select CURRENT_TIME;
>        timetz       
> 
> 16:45:43.398109+09
> (1 row)
> postgres=# select LOCALTIME;
>       time      
> 
> 16:45:48.60969
> (1 row){code}
> Before this, we need to support TIME types (java.sql.Time).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Sachin Pasalkar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012784#comment-17012784
 ] 

Sachin Pasalkar commented on SPARK-30460:
-

Yes may be or may be not. 

I was able to run this on my production for 4-6 hours without any other issues 
for 4-5 times. It always failed with this issue. If this fix the some part of 
problem we should fix it. 

I understand spark 3.0 has new committer but as you said it is not deeply 
tested. Soon I am going to run my Production with this fix in place, I will 
update ticket around next EOW. If I was able to run system smoothly or not

> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazo

[jira] [Comment Edited] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Sachin Pasalkar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012784#comment-17012784
 ] 

Sachin Pasalkar edited comment on SPARK-30460 at 1/10/20 12:19 PM:
---

[~gsomogyi]  Yes may be or may be not.

I was able to run this on my production for 4-6 hours without any other issues 
for 4-5 times. It always failed with this issue. If this fix the some part of 
problem we should fix it.

I understand spark 3.0 has new committer but as you said it is not deeply 
tested. Soon I am going to run my Production with this fix in place, I will 
update ticket around next EOW. If I was able to run system smoothly or not


was (Author: sachin):
Yes may be or may be not. 

I was able to run this on my production for 4-6 hours without any other issues 
for 4-5 times. It always failed with this issue. If this fix the some part of 
problem we should fix it. 

I understand spark 3.0 has new committer but as you said it is not deeply 
tested. Soon I am going to run my Production with this fix in place, I will 
update ticket around next EOW. If I was able to run system smoothly or not

> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$

[jira] [Resolved] (SPARK-30447) Constant propagation nullability issue

2020-01-10 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-30447.
--
Fix Version/s: 3.0.0
 Assignee: Peter Toth
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/27119

> Constant propagation nullability issue
> --
>
> Key: SPARK-30447
> URL: https://issues.apache.org/jira/browse/SPARK-30447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.0.0
>
>
> There is a bug in constant propagation due to null handling:
> SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1) returns those rows where c is 
> null due to 1 + 1 = 1 propagation, but it shouldn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30476) NullPointerException when Insert data to hive mongo external table by spark-sql

2020-01-10 Thread XiongCheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiongCheng updated SPARK-30476:
---
Summary: NullPointerException when Insert data to hive mongo external table 
by spark-sql  (was: NullPointException when Insert data to hive mongo external 
table by spark-sql)

> NullPointerException when Insert data to hive mongo external table by 
> spark-sql
> ---
>
> Key: SPARK-30476
> URL: https://issues.apache.org/jira/browse/SPARK-30476
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.3
> Environment: mongo-hadoop: 2.0.2
> spark-version: 2.4.3
> scala-version: 2.11
> hive-version: 1.2.1
> hadoop-version: 2.6.0
>Reporter: XiongCheng
>Priority: Major
>
> I execute the sql,but i got a NPE.
> result_data_mongo is a mongodb hive external table.
> {code:java}
> insert into result_data_mongo 
> values("15","15","15",15,"15",15,15,15,15,15,15,15,15,15,15);
> {code}
> NPE detail:
> {code:java}
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
>   at 
> org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:123)
>   at 
> org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:103)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:120)
>   at 
> org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:236)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:170)
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:169)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>   at 
> com.mongodb.hadoop.output.MongoOutputCommitter.getTaskAttemptPath(MongoOutputCommitter.java:264)
>   at 
> com.mongodb.hadoop.output.MongoRecordWriter.(MongoRecordWriter.java:59)
>   at 
> com.mongodb.hadoop.hive.output.HiveMongoOutputFormat$HiveMongoRecordWriter.(HiveMongoOutputFormat.java:80)
>   at 
> com.mongodb.hadoop.hive.output.HiveMongoOutputFormat.getHiveRecordWriter(HiveMongoOutputFormat.java:52)
>   at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
>   at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
>   ... 15 more
> {code}
> I know mongo-hadoop use the incorrect key to get TaskAttemptID,so I modified 
> the source code of mongo-hadoop to get the correct properties 
> ('mapreduce.task.id' and 'mapreduce.task.attempt.id'), but I still can't get 
> the value. I found that these parameters are stored in spark In 
> TaskAttemptContext, but TaskAttemptContext is not passed into 
> HiveOutputWriter, is this a design flaw?
> here are two key point.
> mongo-hadoop: 
> [https://github.com/mongodb/mongo-hadoop/blob/cdcd0f15503f2d1c5a1a2e3941711d850d1e427b/hive/src/main/java/com/mongodb/hadoop/hive/output/HiveMongoOutputFormat.java#L80]
> spark-hive:[https://github.com/apache/spark/blob/7c7d7f6a878b02ece881266ee538f3e1443aa8c1/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala#L103]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30234) ADD FILE can not add folder from Spark-sql

2020-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-30234.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26863
[https://github.com/apache/spark/pull/26863]

> ADD FILE can not add folder from Spark-sql
> --
>
> Key: SPARK-30234
> URL: https://issues.apache.org/jira/browse/SPARK-30234
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rakesh Raushan
>Assignee: Rakesh Raushan
>Priority: Minor
> Fix For: 3.0.0
>
>
> We cannot add directories using spark-sql CLI.
> In SPARK-4687 support was added for directories as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30234) ADD FILE can not add folder from Spark-sql

2020-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-30234:


Assignee: Rakesh Raushan

> ADD FILE can not add folder from Spark-sql
> --
>
> Key: SPARK-30234
> URL: https://issues.apache.org/jira/browse/SPARK-30234
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rakesh Raushan
>Assignee: Rakesh Raushan
>Priority: Minor
>
> We cannot add directories using spark-sql CLI.
> In SPARK-4687 support was added for directories as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30485) Remove SQL configs deprecated before v2.4

2020-01-10 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012872#comment-17012872
 ] 

Sean R. Owen commented on SPARK-30485:
--

We had previously removed methods and APIs that were deprecated in 2.3 or 
earlier, so I think this would be consistent.

> Remove SQL configs deprecated before v2.4
> -
>
> Key: SPARK-30485
> URL: https://issues.apache.org/jira/browse/SPARK-30485
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Remove the following SQL configs:
> * spark.sql.variable.substitute.depth
> * spark.sql.execution.pandas.respectSessionTimeZone
> * spark.sql.parquet.int64AsTimestampMillis
> * Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName 
> which was deprecated in v2.4
> Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
> map:
> https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30480) Pyspark test "test_memory_limit" fails consistently

2020-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-30480:
-
Fix Version/s: (was: 3.0.0)

> Pyspark test "test_memory_limit" fails consistently
> ---
>
> Key: SPARK-30480
> URL: https://issues.apache.org/jira/browse/SPARK-30480
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> I'm seeing consistent pyspark test failures on multiple PRs 
> ([#26955|https://github.com/apache/spark/pull/26955], 
> [#26201|https://github.com/apache/spark/pull/26201], 
> [#27064|https://github.com/apache/spark/pull/27064]), and they all failed 
> from "test_memory_limit".
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116422/testReport]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116438/testReport]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116429/testReport]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116366/testReport]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-30480) Pyspark test "test_memory_limit" fails consistently

2020-01-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-30480:
--

Reverted at 
[https://github.com/apache/spark/commit/d0983af38ffb123fa440bc5fcf3912db7658dd28]

> Pyspark test "test_memory_limit" fails consistently
> ---
>
> Key: SPARK-30480
> URL: https://issues.apache.org/jira/browse/SPARK-30480
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> I'm seeing consistent pyspark test failures on multiple PRs 
> ([#26955|https://github.com/apache/spark/pull/26955], 
> [#26201|https://github.com/apache/spark/pull/26201], 
> [#27064|https://github.com/apache/spark/pull/27064]), and they all failed 
> from "test_memory_limit".
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116422/testReport]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116438/testReport]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116429/testReport]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116366/testReport]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30448) accelerator aware scheduling enforce cores as limiting resource

2020-01-10 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-30448.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> accelerator aware scheduling enforce cores as limiting resource
> ---
>
> Key: SPARK-30448
> URL: https://issues.apache.org/jira/browse/SPARK-30448
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
> Fix For: 3.0.0
>
>
> For the first version of accelerator aware scheduling(SPARK-27495), the SPIP 
> had a condition that we can support dynamic allocation because we were going 
> to have a strict requirement that we don't waste any resources. This means 
> that the number of number of slots each executor has could be calculated from 
> the number of cores and task cpus just as is done today.
> Somewhere along the line of development we relaxed that and only warn when we 
> are wasting resources. This breaks the dynamic allocation logic if the 
> limiting resource is no longer the cores.  This means we will request less 
> executors then we really need to run everything.
> We have to enforce that cores is always the limiting resource so we should 
> throw if its not.
> I guess we could only make this a requirement with dynamic allocation on, but 
> to make the behavior consistent I would say we just require it across the 
> board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30448) accelerator aware scheduling enforce cores as limiting resource

2020-01-10 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-30448:
-

Assignee: Thomas Graves

> accelerator aware scheduling enforce cores as limiting resource
> ---
>
> Key: SPARK-30448
> URL: https://issues.apache.org/jira/browse/SPARK-30448
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> For the first version of accelerator aware scheduling(SPARK-27495), the SPIP 
> had a condition that we can support dynamic allocation because we were going 
> to have a strict requirement that we don't waste any resources. This means 
> that the number of number of slots each executor has could be calculated from 
> the number of cores and task cpus just as is done today.
> Somewhere along the line of development we relaxed that and only warn when we 
> are wasting resources. This breaks the dynamic allocation logic if the 
> limiting resource is no longer the cores.  This means we will request less 
> executors then we really need to run everything.
> We have to enforce that cores is always the limiting resource so we should 
> throw if its not.
> I guess we could only make this a requirement with dynamic allocation on, but 
> to make the behavior consistent I would say we just require it across the 
> board.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30343) Skip unnecessary checks in RewriteDistinctAggregates

2020-01-10 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-30343.
--
Fix Version/s: 3.0.0
 Assignee: Takeshi Yamamuro
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/26997

> Skip unnecessary checks in RewriteDistinctAggregates
> 
>
> Key: SPARK-30343
> URL: https://issues.apache.org/jira/browse/SPARK-30343
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30460) Spark checkpoint failing after some run with S3 path

2020-01-10 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012928#comment-17012928
 ] 

Gabor Somogyi commented on SPARK-30460:
---

[~Sachin] OK, good luck then :)

> Spark checkpoint failing after some run with S3 path 
> -
>
> Key: SPARK-30460
> URL: https://issues.apache.org/jira/browse/SPARK-30460
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.4.4
>Reporter: Sachin Pasalkar
>Priority: Major
>
> We are using EMR with the SQS as source of stream. However it is failing, 
> after 4-6 hours of run, with below exception. Application shows its running 
> but stops the processing the messages
> {code:java}
> 2020-01-06 13:04:10,548 WARN [BatchedWriteAheadLog Writer] 
> org.apache.spark.streaming.util.BatchedWriteAheadLog:BatchedWriteAheadLog 
> Writer failed to write ArrayBuffer(Record(java.nio.HeapByteBuffer[pos=0 
> lim=1226 cap=1226],1578315850302,Future()))
> java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.(FileBasedWriteAheadLogWriter.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.getLogWriter(FileBasedWriteAheadLog.scala:229)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:94)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLog.write(FileBasedWriteAheadLog.scala:50)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.org$apache$spark$streaming$util$BatchedWriteAheadLog$$flushRecords(BatchedWriteAheadLog.scala:175)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog$$anon$1.run(BatchedWriteAheadLog.scala:142)
>   at java.lang.Thread.run(Thread.java:748)
> 2020-01-06 13:04:10,554 WARN [wal-batching-thread-pool-0] 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker:Exception thrown 
> while writing record: 
> BlockAdditionEvent(ReceivedBlockInfo(0,Some(3),None,WriteAheadLogBasedStoreResult(input-0-1578315849800,Some(3),FileBasedWriteAheadLogSegment(s3://mss-prod-us-east-1-ueba-bucket/streaming/checkpoint/receivedData/0/log-1578315850001-1578315910001,0,5175
>  to the WriteAheadLog.
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
>   at 
> org.apache.spark.streaming.util.BatchedWriteAheadLog.write(BatchedWriteAheadLog.scala:84)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.writeToLog(ReceivedBlockTracker.scala:242)
>   at 
> org.apache.spark.streaming.scheduler.ReceivedBlockTracker.addBlock(ReceivedBlockTracker.scala:89)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker.org$apache$spark$streaming$scheduler$ReceiverTracker$$addBlock(ReceiverTracker.scala:347)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1$$anonfun$run$1.apply$mcV$sp(ReceiverTracker.scala:522)
>   at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
>   at 
> org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$receiveAndReply$1$$anon$1.run(ReceiverTracker.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.UnsupportedOperationException
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
>   at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
>   at 
> org.apache.spark.streaming.util.HdfsUtils$.getOutputStream(HdfsUtils.scala:35)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream$lzycompute(FileBasedWriteAheadLogWriter.scala:32)
>   at 
> org.apache.spark.streaming.util.FileBasedWriteAheadLogWriter.stream(FileBasedWriteAheadLogWriter.scala:32)
>   

[jira] [Resolved] (SPARK-30196) Bump lz4-java version to 1.7.0

2020-01-10 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-30196.
--
Resolution: Fixed

> Bump lz4-java version to 1.7.0
> --
>
> Key: SPARK-30196
> URL: https://issues.apache.org/jira/browse/SPARK-30196
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30196) Bump lz4-java version to 1.7.0

2020-01-10 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012938#comment-17012938
 ] 

Takeshi Yamamuro commented on SPARK-30196:
--

v1.7.1 will be released in the end of next week: 
https://github.com/lz4/lz4-java/issues/156#issuecomment-573063299
I'll close this and make a new jira for that.

> Bump lz4-java version to 1.7.0
> --
>
> Key: SPARK-30196
> URL: https://issues.apache.org/jira/browse/SPARK-30196
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30486) Bump lz4-java version to 1.7.1

2020-01-10 Thread Takeshi Yamamuro (Jira)
Takeshi Yamamuro created SPARK-30486:


 Summary: Bump lz4-java version to 1.7.1
 Key: SPARK-30486
 URL: https://issues.apache.org/jira/browse/SPARK-30486
 Project: Spark
  Issue Type: Improvement
  Components: Build, Spark Core
Affects Versions: 3.0.0
Reporter: Takeshi Yamamuro


lz4-java v1.7.0 has an issue on older macOS (e.g., v10.12 and v10.13). Since 
v1.7.1 will be released in the end of next week, we need to upgrade: 
https://github.com/lz4/lz4-java/issues/156#issuecomment-573063299



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30196) Bump lz4-java version to 1.7.0

2020-01-10 Thread Takeshi Yamamuro (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012939#comment-17012939
 ] 

Takeshi Yamamuro commented on SPARK-30196:
--

https://issues.apache.org/jira/browse/SPARK-30486

> Bump lz4-java version to 1.7.0
> --
>
> Key: SPARK-30196
> URL: https://issues.apache.org/jira/browse/SPARK-30196
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30487) Hive MetaException

2020-01-10 Thread Rakesh yadav (Jira)
Rakesh yadav created SPARK-30487:


 Summary:  Hive MetaException
 Key: SPARK-30487
 URL: https://issues.apache.org/jira/browse/SPARK-30487
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.4.4
Reporter: Rakesh yadav
 Fix For: 2.3.5


Hi ,

I am getting below error

INFO TransactionTableCreation: Exception Occurred - 
[Ljava.lang.StackTraceElement;@4fd7c296
20/01/10 14:09:07 INFO TransactionTableCreation: Exception Occurred - Caught 
Hive MetaException attempting to get partition metadata by filter from Hive. 
You can set the Spark configuration setting 
spark.sql.hive.manageFilesourcePartitions to false to work around this problem, 
however this will result in degraded performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30196) Bump lz4-java version to 1.7.0

2020-01-10 Thread Lars Francke (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013039#comment-17013039
 ] 

Lars Francke commented on SPARK-30196:
--

Excellent, thank you!

> Bump lz4-java version to 1.7.0
> --
>
> Key: SPARK-30196
> URL: https://issues.apache.org/jira/browse/SPARK-30196
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Spark Core
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30468) Use multiple lines to display data columns for show create table command

2020-01-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-30468:


Assignee: Zhenhua Wang

> Use multiple lines to display data columns for show create table command
> 
>
> Key: SPARK-30468
> URL: https://issues.apache.org/jira/browse/SPARK-30468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Minor
>
> Currently data columns are displayed in one line for show create table 
> command, when the table has many columns (to make things even worse, columns 
> may have long names or comments), the displayed result is really hard to read.
> To improve readability, we could print each column in a separate line. Note 
> that other systems like Hive/MySQL also display in this way.
> Also, for data columns, table properties and options, we'd better put the 
> right parenthesis to the end of the last column/property/option, instead of 
> occupying a separate line.
> As a result, before the change:
> {noformat}
> spark-sql> show create table test_table;
> CREATE TABLE `test_table` (`col1` INT COMMENT 'This is comment for column 1', 
> `col2` STRING COMMENT 'This is comment for column 2', `col3` DOUBLE COMMENT 
> 'This is comment for column 3')
> USING parquet
> OPTIONS (
>   `bar` '2',
>   `foo` '1'
> )
> TBLPROPERTIES (
>   'a' = 'x',
>   'b' = 'y'
> )
> {noformat}
> after the change:
> {noformat}
> spark-sql> show create table test_table;
> CREATE TABLE `test_table` (
>   `col1` INT COMMENT 'This is comment for column 1',
>   `col2` STRING COMMENT 'This is comment for column 2',
>   `col3` DOUBLE COMMENT 'This is comment for column 3')
> USING parquet
> OPTIONS (
>   `bar` '2',
>   `foo` '1')
> TBLPROPERTIES (
>   'a' = 'x',
>   'b' = 'y')
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30468) Use multiple lines to display data columns for show create table command

2020-01-10 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-30468.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27147
[https://github.com/apache/spark/pull/27147]

> Use multiple lines to display data columns for show create table command
> 
>
> Key: SPARK-30468
> URL: https://issues.apache.org/jira/browse/SPARK-30468
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently data columns are displayed in one line for show create table 
> command, when the table has many columns (to make things even worse, columns 
> may have long names or comments), the displayed result is really hard to read.
> To improve readability, we could print each column in a separate line. Note 
> that other systems like Hive/MySQL also display in this way.
> Also, for data columns, table properties and options, we'd better put the 
> right parenthesis to the end of the last column/property/option, instead of 
> occupying a separate line.
> As a result, before the change:
> {noformat}
> spark-sql> show create table test_table;
> CREATE TABLE `test_table` (`col1` INT COMMENT 'This is comment for column 1', 
> `col2` STRING COMMENT 'This is comment for column 2', `col3` DOUBLE COMMENT 
> 'This is comment for column 3')
> USING parquet
> OPTIONS (
>   `bar` '2',
>   `foo` '1'
> )
> TBLPROPERTIES (
>   'a' = 'x',
>   'b' = 'y'
> )
> {noformat}
> after the change:
> {noformat}
> spark-sql> show create table test_table;
> CREATE TABLE `test_table` (
>   `col1` INT COMMENT 'This is comment for column 1',
>   `col2` STRING COMMENT 'This is comment for column 2',
>   `col3` DOUBLE COMMENT 'This is comment for column 3')
> USING parquet
> OPTIONS (
>   `bar` '2',
>   `foo` '1')
> TBLPROPERTIES (
>   'a' = 'x',
>   'b' = 'y')
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26494) 【spark sql】Use spark to read oracle TIMESTAMP(6) WITH LOCAL TIME ZONE type can't be found,

2020-01-10 Thread Jeff Evans (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013093#comment-17013093
 ] 

Jeff Evans commented on SPARK-26494:


To be clear, this type represents an instant in time.  From [the 
docs|https://docs.oracle.com/database/121/SUTIL/GUID-CB5D2124-D9AE-4C71-A83D-DFE071FE3542.htm]:

{quote}The TIMESTAMP WITH LOCAL TIME ZONE data type is another variant of 
TIMESTAMP that includes a time zone offset in its value. Data stored in the 
database is normalized to the database time zone, and time zone displacement is 
not stored as part of the column data. When the data is retrieved, it is 
returned in the user's local session time zone. It is specified as 
follows:{quote}

So it's really almost the same as a {{TIMESTAMP}}, just that it does some kind 
of automatic TZ conversion (converting from the offset given by the client to 
the DB server's offset automatically).  But that conversion is orthogonal to 
Spark entirely; it should just be treated like a {{TIMESTAMP}}.

> 【spark sql】Use spark to read oracle TIMESTAMP(6) WITH LOCAL TIME ZONE type 
> can't be found,
> --
>
> Key: SPARK-26494
> URL: https://issues.apache.org/jira/browse/SPARK-26494
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: kun'qin 
>Priority: Minor
>
> Use spark to read oracle TIMESTAMP(6) WITH LOCAL TIME ZONE type can't be 
> found,
> When the data type is TIMESTAMP(6) WITH LOCAL TIME ZONE
> At this point, the sqlType value of the function getCatalystType in the 
> JdbcUtils class is -102.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29779) Compact old event log files and clean up

2020-01-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin reassigned SPARK-29779:
--

Assignee: Jungtaek Lim

> Compact old event log files and clean up
> 
>
> Key: SPARK-29779
> URL: https://issues.apache.org/jira/browse/SPARK-29779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> Please note that this issue leaves below functionalities for future JIRA 
> issue as the patch for SPARK-29779 is too huge and we decided to break down.
>  * apply filter in SQL events
>  * integrate compaction into FsHistoryProvider
>  * documentation about new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29779) Compact old event log files and clean up

2020-01-10 Thread Marcelo Masiero Vanzin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Masiero Vanzin resolved SPARK-29779.

Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27085
[https://github.com/apache/spark/pull/27085]

> Compact old event log files and clean up
> 
>
> Key: SPARK-29779
> URL: https://issues.apache.org/jira/browse/SPARK-29779
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue is to track the effort on compacting old event logs (and cleaning 
> up after compaction) without breaking guaranteeing of compatibility.
> Please note that this issue leaves below functionalities for future JIRA 
> issue as the patch for SPARK-29779 is too huge and we decided to break down.
>  * apply filter in SQL events
>  * integrate compaction into FsHistoryProvider
>  * documentation about new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination

2020-01-10 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013143#comment-17013143
 ] 

Shane Knapp commented on SPARK-29988:
-

got it, i'll get those sorted later today.

> Adjust Jenkins jobs for `hive-1.2/2.3` combination
> --
>
> Key: SPARK-29988
> URL: https://issues.apache.org/jira/browse/SPARK-29988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2020-01-09 at 1.59.25 PM.png
>
>
> We need to rename the following Jenkins jobs first.
> spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
> spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
> spark-master-test-maven-hadoop-2.7 -> 
> spark-master-test-maven-hadoop-2.7-hive-1.2
> spark-master-test-maven-hadoop-3.2 -> 
> spark-master-test-maven-hadoop-3.2-hive-2.3
> Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
> {code}
> -Phive \
> +-Phive-1.2 \
> {code}
> And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
> {code}
> -Phive \
> +-Phive-2.3 \
> {code}
> Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
> manually. (This should be added to SCM of AmpLab Jenkins.)
> After SPARK-29981, we need to create two new jobs.
> - spark-master-test-sbt-hadoop-2.7-hive-2.3
> - spark-master-test-maven-hadoop-2.7-hive-2.3
> This is for preparation for Apache Spark 3.0.0.
> We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29988) Adjust Jenkins jobs for `hive-1.2/2.3` combination

2020-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013145#comment-17013145
 ] 

Dongjoon Hyun commented on SPARK-29988:
---

Thank you!

> Adjust Jenkins jobs for `hive-1.2/2.3` combination
> --
>
> Key: SPARK-29988
> URL: https://issues.apache.org/jira/browse/SPARK-29988
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Shane Knapp
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2020-01-09 at 1.59.25 PM.png
>
>
> We need to rename the following Jenkins jobs first.
> spark-master-test-sbt-hadoop-2.7 -> spark-master-test-sbt-hadoop-2.7-hive-1.2
> spark-master-test-sbt-hadoop-3.2 -> spark-master-test-sbt-hadoop-3.2-hive-2.3
> spark-master-test-maven-hadoop-2.7 -> 
> spark-master-test-maven-hadoop-2.7-hive-1.2
> spark-master-test-maven-hadoop-3.2 -> 
> spark-master-test-maven-hadoop-3.2-hive-2.3
> Also, we need to add `-Phive-1.2` for the existing `hadoop-2.7` jobs.
> {code}
> -Phive \
> +-Phive-1.2 \
> {code}
> And, we need to add `-Phive-2.3` for the existing `hadoop-3.2` jobs.
> {code}
> -Phive \
> +-Phive-2.3 \
> {code}
> Now now, I added the above `-Phive-1.2` and `-Phive-2.3` to the Jenkins 
> manually. (This should be added to SCM of AmpLab Jenkins.)
> After SPARK-29981, we need to create two new jobs.
> - spark-master-test-sbt-hadoop-2.7-hive-2.3
> - spark-master-test-maven-hadoop-2.7-hive-2.3
> This is for preparation for Apache Spark 3.0.0.
> We may drop all `*-hive-1.2` jobs at Apache Spark 3.1.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30447) Constant propagation nullability issue

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30447:
--
Affects Version/s: 2.4.4

> Constant propagation nullability issue
> --
>
> Key: SPARK-30447
> URL: https://issues.apache.org/jira/browse/SPARK-30447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.0.0
>
>
> There is a bug in constant propagation due to null handling:
> SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1) returns those rows where c is 
> null due to 1 + 1 = 1 propagation, but it shouldn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30447) Constant propagation nullability issue

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30447:
--
Fix Version/s: 2.4.5

> Constant propagation nullability issue
> --
>
> Key: SPARK-30447
> URL: https://issues.apache.org/jira/browse/SPARK-30447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> There is a bug in constant propagation due to null handling:
> SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1) returns those rows where c is 
> null due to 1 + 1 = 1 propagation, but it shouldn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30447) Constant propagation nullability issue

2020-01-10 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013148#comment-17013148
 ] 

Dongjoon Hyun commented on SPARK-30447:
---

Hi, [~petertoth].
Could you check the old Spark versions (2.3.4/2.2.3) and update `Affected 
Versions` please?

> Constant propagation nullability issue
> --
>
> Key: SPARK-30447
> URL: https://issues.apache.org/jira/browse/SPARK-30447
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.5, 3.0.0
>
>
> There is a bug in constant propagation due to null handling:
> SELECT * FROM t WHERE NOT(c = 1 AND c + 1 = 1) returns those rows where c is 
> null due to 1 + 1 = 1 propagation, but it shouldn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30312) Preserve path permission when truncate table

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30312:
--
Affects Version/s: 2.0.2
   2.1.3
   2.2.3
   2.3.4
   2.4.4

> Preserve path permission when truncate table
> 
>
> Key: SPARK-30312
> URL: https://issues.apache.org/jira/browse/SPARK-30312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> When Spark SQL truncates table, it deletes the paths of table/partitions, 
> then re-create new ones. If custom permission/acls are set on the paths, the 
> metadata will be deleted.
> We should preserve original permission/acls if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30312) Preserve path permission when truncate table

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30312:
--
Issue Type: Bug  (was: Improvement)

> Preserve path permission when truncate table
> 
>
> Key: SPARK-30312
> URL: https://issues.apache.org/jira/browse/SPARK-30312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> When Spark SQL truncates table, it deletes the paths of table/partitions, 
> then re-create new ones. If custom permission/acls are set on the paths, the 
> metadata will be deleted.
> We should preserve original permission/acls if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30312) Preserve path permission when truncate table

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30312.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/26956

> Preserve path permission when truncate table
> 
>
> Key: SPARK-30312
> URL: https://issues.apache.org/jira/browse/SPARK-30312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> When Spark SQL truncates table, it deletes the paths of table/partitions, 
> then re-create new ones. If custom permission/acls are set on the paths, the 
> metadata will be deleted.
> We should preserve original permission/acls if possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29174) LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-29174:
--
Issue Type: Improvement  (was: Bug)

> LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source
> ---
>
> Key: SPARK-29174
> URL: https://issues.apache.org/jira/browse/SPARK-29174
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Major
>
> *using does not work for insert overwrite when in local  but works when 
> insert overwrite in HDFS directory*
> {code}
> 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite directory 
> '/user/trash2/' using parquet select * from trash1 a where a.country='PAK';
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.448 seconds)
> 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory 
> '/opt/trash2/' using parquet select * from trash1 a where a.country='PAK';
> Error: org.apache.spark.sql.catalyst.parser.ParseException:
> LOCAL is not supported in INSERT OVERWRITE DIRECTORY to data source(line 1, 
> pos 0)
>  
> == SQL ==
> insert overwrite local directory '/opt/trash2/' using parquet select * from 
> trash1 a where a.country='PAK'
> ^^^ (state=,code=0)
> 0: jdbc:hive2://10.18.18.214:23040/default> insert overwrite local directory 
> '/opt/trash2/' stored as parquet select * from trash1 a where a.country='PAK';
> +-+--+
> | Result  |
> +-+--+
> | | |
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26494) Support Oracle TIMESTAMP WITH LOCAL TIME ZONE type

2020-01-10 Thread Jeff Evans (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Evans updated SPARK-26494:
---
Summary: Support Oracle TIMESTAMP WITH LOCAL TIME ZONE type  (was: 【spark 
sql】Use spark to read oracle TIMESTAMP(6) WITH LOCAL TIME ZONE type can't be 
found,)

> Support Oracle TIMESTAMP WITH LOCAL TIME ZONE type
> --
>
> Key: SPARK-26494
> URL: https://issues.apache.org/jira/browse/SPARK-26494
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: kun'qin 
>Priority: Minor
>
> Use spark to read oracle TIMESTAMP(6) WITH LOCAL TIME ZONE type can't be 
> found,
> When the data type is TIMESTAMP(6) WITH LOCAL TIME ZONE
> At this point, the sqlType value of the function getCatalystType in the 
> JdbcUtils class is -102.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30488) Deadlock between block-manager-slave-async-thread-pool and spark context cleaner

2020-01-10 Thread Rohit Agrawal (Jira)
Rohit Agrawal created SPARK-30488:
-

 Summary: Deadlock between block-manager-slave-async-thread-pool 
and spark context cleaner
 Key: SPARK-30488
 URL: https://issues.apache.org/jira/browse/SPARK-30488
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.3
Reporter: Rohit Agrawal


Deadlock happens while cleaning up the spark context. Here is the full thread 
dump:

 
 
 
 
 
 at 
org.apache.hadoop.util.ShutdownHookManager.executeShutdown(ShutdownHookManager.java:121)
 
 at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:95) 
"Spark Context Cleaner": 
 at java.lang.ClassLoader.checkCerts(ClassLoader.java:887) 
 - waiting to lock <0xca33e4c8> (a 
sbt.internal.ManagedClassLoader$ZombieClassLoader) 
 at java.lang.ClassLoader.preDefineClass(ClassLoader.java:668) 
 at java.lang.ClassLoader.defineClass(ClassLoader.java:761) 
 at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) 
 at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) 
 at java.net.URLClassLoader.access$100(URLClassLoader.java:74) 
 at java.net.URLClassLoader$1.run(URLClassLoader.java:369) 
 at java.net.URLClassLoader$1.run(URLClassLoader.java:363) 
 at java.security.AccessController.doPrivileged(Native Method) 
 at java.net.URLClassLoader.findClass(URLClassLoader.java:362) 
 at 
sbt.internal.ManagedClassLoader$ZombieClassLoader.lookupClass(LayeredClassLoaders.scala:336)
 
 at sbt.internal.ManagedClassLoader.findClass(LayeredClassLoaders.scala:375) 
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424) 
 - locked <0xc1f359f0> (a sbt.internal.LayeredClassLoader) 
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
 at 
org.apache.spark.storage.BlockManagerMaster.removeShuffle(BlockManagerMaster.scala:138)
 
 at org.apache.spark.ContextCleaner.doCleanupShuffle(ContextCleaner.scala:226) 
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:192)
 
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:185)
 
 at scala.Option.foreach(Option.scala:257) 
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:185)
 
 - locked <0xc3d74cd0> (a org.apache.spark.ContextCleaner) 
 at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1302) 
 at 
org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
 
 at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73) 
"block-manager-slave-async-thread-pool-81": 
 at java.lang.ClassLoader.loadClass(ClassLoader.java:404) 
 - waiting to lock <0xc1f359f0> (a sbt.internal.LayeredClassLoader) 
 at java.lang.ClassLoader.loadClass(ClassLoader.java:411) 
 - locked <0xca33e4c8> (a 
sbt.internal.ManagedClassLoader$ZombieClassLoader) 
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357) 
 at 
org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply$mcZ$sp(BlockManagerSlaveEndpoint.scala:58)
 
 at 
org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
 
 at 
org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$receiveAndReply$1$$anonfun$applyOrElse$3.apply(BlockManagerSlaveEndpoint.scala:57)
 
 at 
org.apache.spark.storage.BlockManagerSlaveEndpoint$$anonfun$1.apply(BlockManagerSlaveEndpoint.scala:86)
 
 at 
scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
 
 at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) 
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
 at java.lang.Thread.run(Thread.java:748) 
 
Found 1 deadlock. 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29748) Remove sorting of fields in PySpark SQL Row creation

2020-01-10 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-29748.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26496
[https://github.com/apache/spark/pull/26496]

> Remove sorting of fields in PySpark SQL Row creation
> 
>
> Key: SPARK-29748
> URL: https://issues.apache.org/jira/browse/SPARK-29748
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, when a PySpark Row is created with keyword arguments, the fields 
> are sorted alphabetically. This has created a lot of confusion with users 
> because it is not obvious (although it is stated in the pydocs) that they 
> will be sorted alphabetically, and then an error can occur later when 
> applying a schema and the field order does not match.
> The original reason for sorting fields is because kwargs in python < 3.6 are 
> not guaranteed to be in the same order that they were entered. Sorting 
> alphabetically would ensure a consistent order.  Matters are further 
> complicated with the flag {{__from_dict__}} that allows the {{Row}} fields to 
> to be referenced by name when made by kwargs, but this flag is not serialized 
> with the Row and leads to inconsistent behavior.
> This JIRA proposes that any sorting of the Fields is removed. Users with 
> Python 3.6+ creating Rows with kwargs can continue to do so since Python will 
> ensure the order is the same as entered. Users with Python < 3.6 will have to 
> create Rows with an OrderedDict or by using the Row class as a factory 
> (explained in the pydoc).  If kwargs are used, an error will be raised or 
> based on a conf setting it can fall back to a LegacyRow that will sort the 
> fields as before. This LegacyRow will be immediately deprecated and removed 
> once support for Python < 3.6 is dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29748) Remove sorting of fields in PySpark SQL Row creation

2020-01-10 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler reassigned SPARK-29748:


Assignee: Bryan Cutler

> Remove sorting of fields in PySpark SQL Row creation
> 
>
> Key: SPARK-29748
> URL: https://issues.apache.org/jira/browse/SPARK-29748
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> Currently, when a PySpark Row is created with keyword arguments, the fields 
> are sorted alphabetically. This has created a lot of confusion with users 
> because it is not obvious (although it is stated in the pydocs) that they 
> will be sorted alphabetically, and then an error can occur later when 
> applying a schema and the field order does not match.
> The original reason for sorting fields is because kwargs in python < 3.6 are 
> not guaranteed to be in the same order that they were entered. Sorting 
> alphabetically would ensure a consistent order.  Matters are further 
> complicated with the flag {{__from_dict__}} that allows the {{Row}} fields to 
> to be referenced by name when made by kwargs, but this flag is not serialized 
> with the Row and leads to inconsistent behavior.
> This JIRA proposes that any sorting of the Fields is removed. Users with 
> Python 3.6+ creating Rows with kwargs can continue to do so since Python will 
> ensure the order is the same as entered. Users with Python < 3.6 will have to 
> create Rows with an OrderedDict or by using the Row class as a factory 
> (explained in the pydoc).  If kwargs are used, an error will be raised or 
> based on a conf setting it can fall back to a LegacyRow that will sort the 
> fields as before. This LegacyRow will be immediately deprecated and removed 
> once support for Python < 3.6 is dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22232) Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2020-01-10 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-22232.
--
Resolution: Won't Fix

Closing in favor for fix in SPARK-29748

> Row objects in pyspark created using the `Row(**kwars)` syntax do not get 
> serialized/deserialized properly
> --
>
> Key: SPARK-22232
> URL: https://issues.apache.org/jira/browse/SPARK-22232
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>Priority: Major
>
> The fields in a Row object created from a dict (ie {{Row(**kwargs)}}) should 
> be accessed by field name, not by position because {{Row.__new__}} sorts the 
> fields alphabetically by name. It seems like this promise is not being 
> honored when these Row objects are shuffled. I've included an example to help 
> reproduce the issue.
> {code:none}
> from pyspark.sql.types import *
> from pyspark.sql import *
> def toRow(i):
>   return Row(a="a", c=3.0, b=2)
> schema = StructType([
>   # Putting fields in alphabetical order masks the issue
>   StructField("a", StringType(),  False),
>   StructField("c", FloatType(), False),
>   StructField("b", IntegerType(), False),
> ])
> rdd = sc.parallelize(range(10)).repartition(2).map(lambda i: toRow(i))
> # As long as we don't shuffle things work fine.
> print rdd.toDF(schema).take(2)
> # If we introduce a shuffle we have issues
> print rdd.repartition(3).toDF(schema).take(2)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24915) Calling SparkSession.createDataFrame with schema can throw exception

2020-01-10 Thread Bryan Cutler (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-24915.
--
Resolution: Won't Fix

Closing in favor of fix in SPARK-29748

> Calling SparkSession.createDataFrame with schema can throw exception
> 
>
> Key: SPARK-24915
> URL: https://issues.apache.org/jira/browse/SPARK-24915
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.1
> Environment: Python 3.6.3
> PySpark 2.3.1 (installed via pip)
> OSX 10.12.6
>Reporter: Stephen Spencer
>Priority: Major
>
> There seems to be a bug in PySpark when using the PySparkSQL session to 
> create a dataframe with a pre-defined schema.
> Code to reproduce the error:
> {code:java}
> from pyspark import SparkConf, SparkContext
> from pyspark.sql import SparkSession
> from pyspark.sql.types import StructType, StructField, StringType, Row
> conf = SparkConf().setMaster("local").setAppName("repro") 
> context = SparkContext(conf=conf) 
> session = SparkSession(context)
> # Construct schema (the order of fields is important)
> schema = StructType([
> StructField('field2', StructType([StructField('sub_field', StringType(), 
> False)]), False),
> StructField('field1', StringType(), False),
> ])
> # Create data to populate data frame
> data = [
> Row(field1="Hello", field2=Row(sub_field='world'))
> ]
> # Attempt to create the data frame supplying the schema
> # this will throw a ValueError
> df = session.createDataFrame(data, schema=schema)
> df.show(){code}
> Running this throws a ValueError
> {noformat}
> Traceback (most recent call last):
> File "schema_bug.py", line 18, in 
> df = session.createDataFrame(data, schema=schema)
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/session.py",
>  line 691, in createDataFrame
> rdd, schema = self._createFromLocal(map(prepare, data), schema)
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/session.py",
>  line 423, in _createFromLocal
> data = [schema.toInternal(row) for row in data]
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/session.py",
>  line 423, in 
> data = [schema.toInternal(row) for row in data]
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 601, in toInternal
> for f, v, c in zip(self.fields, obj, self._needConversion))
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 601, in 
> for f, v, c in zip(self.fields, obj, self._needConversion))
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 439, in toInternal
> return self.dataType.toInternal(obj)
> File 
> "/Users/stephenspencer/benevolent/ai/neat/rex/.env/lib/python3.6/site-packages/pyspark/sql/types.py",
>  line 619, in toInternal
> raise ValueError("Unexpected tuple %r with StructType" % obj)
> ValueError: Unexpected tuple 'Hello' with StructType{noformat}
> The problem seems to be here:
> https://github.com/apache/spark/blob/3d5c61e5fd24f07302e39b5d61294da79aa0c2f9/python/pyspark/sql/types.py#L603
> specifically the bit
> {code:java}
> zip(self.fields, obj, self._needConversion)
> {code}
> This zip statement seems to assume that obj and self.fields are ordered in 
> the same way, so that the elements of obj will correspond to the right fields 
> in the schema. However this is not true, a Row orders its elements 
> alphabetically but the fields in the schema are in whatever order they are 
> specified. In this example field2 is being initialised with the field1 
> element 'Hello'. If you re-order the fields in the schema to go (field1, 
> field2), the given example works without error.
> The schema in the repro is specifically designed to elicit the problem, the 
> fields are out of alphabetical order and one field is a StructType, making 
> chema._needSerializeAnyField==True . However we encountered this in real use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29748) Remove sorting of fields in PySpark SQL Row creation

2020-01-10 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz updated SPARK-29748:
---
Labels: release-notes  (was: )

> Remove sorting of fields in PySpark SQL Row creation
> 
>
> Key: SPARK-29748
> URL: https://issues.apache.org/jira/browse/SPARK-29748
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Currently, when a PySpark Row is created with keyword arguments, the fields 
> are sorted alphabetically. This has created a lot of confusion with users 
> because it is not obvious (although it is stated in the pydocs) that they 
> will be sorted alphabetically, and then an error can occur later when 
> applying a schema and the field order does not match.
> The original reason for sorting fields is because kwargs in python < 3.6 are 
> not guaranteed to be in the same order that they were entered. Sorting 
> alphabetically would ensure a consistent order.  Matters are further 
> complicated with the flag {{__from_dict__}} that allows the {{Row}} fields to 
> to be referenced by name when made by kwargs, but this flag is not serialized 
> with the Row and leads to inconsistent behavior.
> This JIRA proposes that any sorting of the Fields is removed. Users with 
> Python 3.6+ creating Rows with kwargs can continue to do so since Python will 
> ensure the order is the same as entered. Users with Python < 3.6 will have to 
> create Rows with an OrderedDict or by using the Row class as a factory 
> (explained in the pydoc).  If kwargs are used, an error will be raised or 
> based on a conf setting it can fall back to a LegacyRow that will sort the 
> fields as before. This LegacyRow will be immediately deprecated and removed 
> once support for Python < 3.6 is dropped.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Jeff Evans (Jira)
Jeff Evans created SPARK-30489:
--

 Summary: Make build delete pyspark.zip file properly
 Key: SPARK-30489
 URL: https://issues.apache.org/jira/browse/SPARK-30489
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.0.0
Reporter: Jeff Evans


The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
within {{python/lib}}.  The only problem is the Ant task definition for the 
delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Affects Version/s: (was: 2.3.4)

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Priority: Trivial
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Issue Type: Bug  (was: Improvement)

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Jeff Evans
>Priority: Trivial
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-30489.
---
Fix Version/s: 3.0.0
   2.4.5
 Assignee: Jeff Evans
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/27171

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Assignee: Jeff Evans
>Priority: Trivial
> Fix For: 2.4.5, 3.0.0
>
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Affects Version/s: 2.3.4
   2.4.4

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Priority: Trivial
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Affects Version/s: 2.3.4

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.3.4, 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Assignee: Jeff Evans
>Priority: Trivial
> Fix For: 2.4.5, 3.0.0
>
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Affects Version/s: 2.0.2

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Assignee: Jeff Evans
>Priority: Trivial
> Fix For: 2.4.5, 3.0.0
>
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Affects Version/s: 2.1.3

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Assignee: Jeff Evans
>Priority: Trivial
> Fix For: 2.4.5, 3.0.0
>
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30489) Make build delete pyspark.zip file properly

2020-01-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-30489:
--
Affects Version/s: 2.2.3

> Make build delete pyspark.zip file properly
> ---
>
> Key: SPARK-30489
> URL: https://issues.apache.org/jira/browse/SPARK-30489
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.3, 2.3.4, 2.4.4, 3.0.0
>Reporter: Jeff Evans
>Assignee: Jeff Evans
>Priority: Trivial
> Fix For: 2.4.5, 3.0.0
>
>
> The build uses Ant tasks to delete, then recreate, the {{pyspark.zip}} file 
> within {{python/lib}}.  The only problem is the Ant task definition for the 
> delete operation is incorrect (it uses {{dir}} instead of {{file}}), so it 
> doesn't actually get deleted by this task.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org