date:20200930

[jira] [Created] (SPARK-33033) Display time series view for task metrics in history server

2020-09-30 Thread Zhen Li (Jira)

Zhen Li created SPARK-33033:
---

 Summary: Display time series view for task metrics in history 
server
 Key: SPARK-33033
 URL: https://issues.apache.org/jira/browse/SPARK-33033
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.1.0
Reporter: Zhen Li


Event log contains all tasks' metrics data, which are useful for performance 
debugging. By now spark UI only displays final aggregation results, much 
information is hidden by this way. If spark UI could provide time series data 
view, it would be more helpful to performance debugging problems. We would like 
to build application statistics page in history server based on task metrics to 
provide more straight forward insight for spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19335) Spark should support doing an efficient DataFrame Upsert via JDBC

2020-09-30 Thread Simone (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204565#comment-17204565
 ] 

Simone commented on SPARK-19335:


+1 it is really strange that Delta Lake offer this capability on parquet files 
and spark is not able to offer the same capability on an RDBMS. The success of 
delta lake clearly shows the need of UPSERT capabilities in spark.

> Spark should support doing an efficient DataFrame Upsert via JDBC
> -
>
> Key: SPARK-19335
> URL: https://issues.apache.org/jira/browse/SPARK-19335
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ilya Ganelin
>Priority: Minor
>
> Doing a database update, as opposed to an insert is useful, particularly when 
> working with streaming applications which may require revisions to previously 
> stored data. 
> Spark DataFrames/DataSets do not currently support an Update feature via the 
> JDBC Writer allowing only Overwrite or Append.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33033) Display time series view for task metrics in history server

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33033:


Assignee: Apache Spark

> Display time series view for task metrics in history server
> ---
>
> Key: SPARK-33033
> URL: https://issues.apache.org/jira/browse/SPARK-33033
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Minor
>
> Event log contains all tasks' metrics data, which are useful for performance 
> debugging. By now spark UI only displays final aggregation results, much 
> information is hidden by this way. If spark UI could provide time series data 
> view, it would be more helpful to performance debugging problems. We would 
> like to build application statistics page in history server based on task 
> metrics to provide more straight forward insight for spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33033) Display time series view for task metrics in history server

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204571#comment-17204571
 ] 

Apache Spark commented on SPARK-33033:
--

User 'zhli1142015' has created a pull request for this issue:
https://github.com/apache/spark/pull/29908

> Display time series view for task metrics in history server
> ---
>
> Key: SPARK-33033
> URL: https://issues.apache.org/jira/browse/SPARK-33033
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Zhen Li
>Priority: Minor
>
> Event log contains all tasks' metrics data, which are useful for performance 
> debugging. By now spark UI only displays final aggregation results, much 
> information is hidden by this way. If spark UI could provide time series data 
> view, it would be more helpful to performance debugging problems. We would 
> like to build application statistics page in history server based on task 
> metrics to provide more straight forward insight for spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33033) Display time series view for task metrics in history server

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33033:


Assignee: (was: Apache Spark)

> Display time series view for task metrics in history server
> ---
>
> Key: SPARK-33033
> URL: https://issues.apache.org/jira/browse/SPARK-33033
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Zhen Li
>Priority: Minor
>
> Event log contains all tasks' metrics data, which are useful for performance 
> debugging. By now spark UI only displays final aggregation results, much 
> information is hidden by this way. If spark UI could provide time series data 
> view, it would be more helpful to performance debugging problems. We would 
> like to build application statistics page in history server based on task 
> metrics to provide more straight forward insight for spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33023) addJar use `Utils.isWindow` to judge windows system

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204576#comment-17204576
 ] 

Apache Spark commented on SPARK-33023:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/29909

> addJar use `Utils.isWindow` to judge windows system
> ---
>
> Key: SPARK-33023
> URL: https://issues.apache.org/jira/browse/SPARK-33023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Priority: Major
>
> addJar use `Utils.isWindow` to judge windows system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33023) addJar use `Utils.isWindow` to judge windows system

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204577#comment-17204577
 ] 

Apache Spark commented on SPARK-33023:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/29909

> addJar use `Utils.isWindow` to judge windows system
> ---
>
> Key: SPARK-33023
> URL: https://issues.apache.org/jira/browse/SPARK-33023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Priority: Major
>
> addJar use `Utils.isWindow` to judge windows system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33023) addJar use `Utils.isWindow` to judge windows system

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33023:


Assignee: (was: Apache Spark)

> addJar use `Utils.isWindow` to judge windows system
> ---
>
> Key: SPARK-33023
> URL: https://issues.apache.org/jira/browse/SPARK-33023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Priority: Major
>
> addJar use `Utils.isWindow` to judge windows system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33023) addJar use `Utils.isWindow` to judge windows system

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33023:


Assignee: Apache Spark

> addJar use `Utils.isWindow` to judge windows system
> ---
>
> Key: SPARK-33023
> URL: https://issues.apache.org/jira/browse/SPARK-33023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Major
>
> addJar use `Utils.isWindow` to judge windows system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32901) UnsafeExternalSorter may cause a SparkOutOfMemoryError to be thrown while spilling

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204651#comment-17204651
 ] 

Apache Spark commented on SPARK-32901:
--

User 'tomvanbussel' has created a pull request for this issue:
https://github.com/apache/spark/pull/29910

> UnsafeExternalSorter may cause a SparkOutOfMemoryError to be thrown while 
> spilling
> --
>
> Key: SPARK-32901
> URL: https://issues.apache.org/jira/browse/SPARK-32901
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> Consider the following sequence of events:
>  # {{UnsafeExternalSorter}} runs out of space in its pointer array and 
> attempts to allocate a large array to replace the current one.
>  # {{TaskMemoryManager}} tries to allocate the memory backing the large array 
> using {{MemoryManager}}, but {{MemoryManager}} is only willing to return most 
> but not all of the memory requested.
>  # {{TaskMemoryManager}} asks {{UnsafeExternalSorter}} to spill, which causes 
> {{UnsafeExternalSorter}} to spill the current run to disk, to free its record 
> pages and to reset its {{UnsafeInMemorySorter}}.
>  # {{UnsafeInMemorySorter}} frees its pointer array, and tries to allocate a 
> new small pointer array.
>  # {{TaskMemoryManager}} tries to allocate the memory backing the small array 
> using {{MemoryManager}}, but {{MemoryManager}} is unwilling to give it any 
> memory, as the {{TaskMemoryManager}} is still holding on to the memory it got 
> for the large array.
>  # {{TaskMemoryManager}} again asks {{UnsafeExternalSorter}} to spill, but 
> this time there is nothing to spill.
>  # {{UnsafeInMemorySorter}} receives less memory than it requested, and 
> causes a {{SparkOutOfMemoryError}} to be thrown, which causes the current 
> task to fail.
> A simple way to fix this is to avoid allocating a new array in 
> {{UnsafeInMemorySorter.reset()}} and to do this on-demand instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33034) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns

2020-09-30 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33034:
--

 Summary: Support ALTER TABLE in JDBC v2 Table Catalog: add, update 
type and nullability of columns
 Key: SPARK-33034
 URL: https://issues.apache.org/jira/browse/SPARK-33034
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Override the default SQL strings for:
- ALTER TABLE ADD COLUMN
- ALTER TABLE UPDATE COLUMN TYPE
- ALTER TABLE UPDATE COLUMN NULLABILITY

in the following Oracle JDBC dialect according to official documentation.

Write Oracle integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33034) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns

2020-09-30 Thread Maxim Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204672#comment-17204672
 ] 

Maxim Gekk commented on SPARK-33034:


I am working on this.

> Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
> nullability of columns
> -
>
> Key: SPARK-33034
> URL: https://issues.apache.org/jira/browse/SPARK-33034
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Override the default SQL strings for:
> - ALTER TABLE ADD COLUMN
> - ALTER TABLE UPDATE COLUMN TYPE
> - ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following Oracle JDBC dialect according to official documentation.
> Write Oracle integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33035) Updates the obsoleted entries of attribute mapping in QueryPlan#transformUpWithNewOutput

2020-09-30 Thread Takeshi Yamamuro (Jira)

Takeshi Yamamuro created SPARK-33035:


 Summary: Updates the obsoleted entries of attribute mapping in 
QueryPlan#transformUpWithNewOutput
 Key: SPARK-33035
 URL: https://issues.apache.org/jira/browse/SPARK-33035
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 3.1.0
Reporter: Takeshi Yamamuro


This ticket aims at fixing corner-case bugs in the 
`QueryPlan#transformUpWithNewOutput` that is used to propagate updated 
`ExprId`s in a bottom-up way. Let's say we have a rule to simply assign new 
`ExprId`s in a projection list like this;
{code}
case class TestRule extends Rule[LogicalPlan] {
  override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformUpWithNewOutput {
case p @ Project(projList, _) =>
  val newPlan = p.copy(projectList = projList.map { _.transform {
// Assigns a new `ExprId` for references
case a: AttributeReference => Alias(a, a.name)()
  }}.asInstanceOf[Seq[NamedExpression]])

  val attrMapping = p.output.zip(newPlan.output)
  newPlan -> attrMapping
  }
}
{code}
Then, this rule is applied into a plan below;
{code}
(3) Project [a#5, b#6]
+- (2) Project [a#5, b#6]
   +- (1) Project [a#5, b#6]
  +- LocalRelation , [a#5, b#6]
{code}
In the first transformation, the rule assigns new `ExprId`s in `(1) Project` 
(e.g., a#5 AS a#7, b#6 AS b#8). In the second transformation, the rule corrects 
the input references of `(2) Project`  first by using attribute mapping given 
from `(1) Project` (a#5->a#7 and b#6->b#8) and then assigns new `ExprId`s 
(e.g., a#7 AS a#9, b#8 AS b#10). But, in the third transformation, the rule 
fails because it tries to correct the references of `(3) Project` by using 
incorrect attribute mapping (a#7->a#9 and b#8->b#10) even though the correct 
one is a#5->a#9 and b#6->b#10. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33035) Updates the obsoleted entries of attribute mapping in QueryPlan#transformUpWithNewOutput

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33035:


Assignee: Apache Spark

> Updates the obsoleted entries of attribute mapping in 
> QueryPlan#transformUpWithNewOutput
> 
>
> Key: SPARK-33035
> URL: https://issues.apache.org/jira/browse/SPARK-33035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Major
>
> This ticket aims at fixing corner-case bugs in the 
> `QueryPlan#transformUpWithNewOutput` that is used to propagate updated 
> `ExprId`s in a bottom-up way. Let's say we have a rule to simply assign new 
> `ExprId`s in a projection list like this;
> {code}
> case class TestRule extends Rule[LogicalPlan] {
>   override def apply(plan: LogicalPlan): LogicalPlan = 
> plan.transformUpWithNewOutput {
> case p @ Project(projList, _) =>
>   val newPlan = p.copy(projectList = projList.map { _.transform {
> // Assigns a new `ExprId` for references
> case a: AttributeReference => Alias(a, a.name)()
>   }}.asInstanceOf[Seq[NamedExpression]])
>   val attrMapping = p.output.zip(newPlan.output)
>   newPlan -> attrMapping
>   }
> }
> {code}
> Then, this rule is applied into a plan below;
> {code}
> (3) Project [a#5, b#6]
> +- (2) Project [a#5, b#6]
>+- (1) Project [a#5, b#6]
>   +- LocalRelation , [a#5, b#6]
> {code}
> In the first transformation, the rule assigns new `ExprId`s in `(1) Project` 
> (e.g., a#5 AS a#7, b#6 AS b#8). In the second transformation, the rule 
> corrects the input references of `(2) Project`  first by using attribute 
> mapping given from `(1) Project` (a#5->a#7 and b#6->b#8) and then assigns new 
> `ExprId`s (e.g., a#7 AS a#9, b#8 AS b#10). But, in the third transformation, 
> the rule fails because it tries to correct the references of `(3) Project` by 
> using incorrect attribute mapping (a#7->a#9 and b#8->b#10) even though the 
> correct one is a#5->a#9 and b#6->b#10. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33035) Updates the obsoleted entries of attribute mapping in QueryPlan#transformUpWithNewOutput

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204679#comment-17204679
 ] 

Apache Spark commented on SPARK-33035:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/29911

> Updates the obsoleted entries of attribute mapping in 
> QueryPlan#transformUpWithNewOutput
> 
>
> Key: SPARK-33035
> URL: https://issues.apache.org/jira/browse/SPARK-33035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at fixing corner-case bugs in the 
> `QueryPlan#transformUpWithNewOutput` that is used to propagate updated 
> `ExprId`s in a bottom-up way. Let's say we have a rule to simply assign new 
> `ExprId`s in a projection list like this;
> {code}
> case class TestRule extends Rule[LogicalPlan] {
>   override def apply(plan: LogicalPlan): LogicalPlan = 
> plan.transformUpWithNewOutput {
> case p @ Project(projList, _) =>
>   val newPlan = p.copy(projectList = projList.map { _.transform {
> // Assigns a new `ExprId` for references
> case a: AttributeReference => Alias(a, a.name)()
>   }}.asInstanceOf[Seq[NamedExpression]])
>   val attrMapping = p.output.zip(newPlan.output)
>   newPlan -> attrMapping
>   }
> }
> {code}
> Then, this rule is applied into a plan below;
> {code}
> (3) Project [a#5, b#6]
> +- (2) Project [a#5, b#6]
>+- (1) Project [a#5, b#6]
>   +- LocalRelation , [a#5, b#6]
> {code}
> In the first transformation, the rule assigns new `ExprId`s in `(1) Project` 
> (e.g., a#5 AS a#7, b#6 AS b#8). In the second transformation, the rule 
> corrects the input references of `(2) Project`  first by using attribute 
> mapping given from `(1) Project` (a#5->a#7 and b#6->b#8) and then assigns new 
> `ExprId`s (e.g., a#7 AS a#9, b#8 AS b#10). But, in the third transformation, 
> the rule fails because it tries to correct the references of `(3) Project` by 
> using incorrect attribute mapping (a#7->a#9 and b#8->b#10) even though the 
> correct one is a#5->a#9 and b#6->b#10. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33035) Updates the obsoleted entries of attribute mapping in QueryPlan#transformUpWithNewOutput

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33035:


Assignee: (was: Apache Spark)

> Updates the obsoleted entries of attribute mapping in 
> QueryPlan#transformUpWithNewOutput
> 
>
> Key: SPARK-33035
> URL: https://issues.apache.org/jira/browse/SPARK-33035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at fixing corner-case bugs in the 
> `QueryPlan#transformUpWithNewOutput` that is used to propagate updated 
> `ExprId`s in a bottom-up way. Let's say we have a rule to simply assign new 
> `ExprId`s in a projection list like this;
> {code}
> case class TestRule extends Rule[LogicalPlan] {
>   override def apply(plan: LogicalPlan): LogicalPlan = 
> plan.transformUpWithNewOutput {
> case p @ Project(projList, _) =>
>   val newPlan = p.copy(projectList = projList.map { _.transform {
> // Assigns a new `ExprId` for references
> case a: AttributeReference => Alias(a, a.name)()
>   }}.asInstanceOf[Seq[NamedExpression]])
>   val attrMapping = p.output.zip(newPlan.output)
>   newPlan -> attrMapping
>   }
> }
> {code}
> Then, this rule is applied into a plan below;
> {code}
> (3) Project [a#5, b#6]
> +- (2) Project [a#5, b#6]
>+- (1) Project [a#5, b#6]
>   +- LocalRelation , [a#5, b#6]
> {code}
> In the first transformation, the rule assigns new `ExprId`s in `(1) Project` 
> (e.g., a#5 AS a#7, b#6 AS b#8). In the second transformation, the rule 
> corrects the input references of `(2) Project`  first by using attribute 
> mapping given from `(1) Project` (a#5->a#7 and b#6->b#8) and then assigns new 
> `ExprId`s (e.g., a#7 AS a#9, b#8 AS b#10). But, in the third transformation, 
> the rule fails because it tries to correct the references of `(3) Project` by 
> using incorrect attribute mapping (a#7->a#9 and b#8->b#10) even though the 
> correct one is a#5->a#9 and b#6->b#10. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33035) Updates the obsoleted entries of attribute mapping in QueryPlan#transformUpWithNewOutput

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204683#comment-17204683
 ] 

Apache Spark commented on SPARK-33035:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/29911

> Updates the obsoleted entries of attribute mapping in 
> QueryPlan#transformUpWithNewOutput
> 
>
> Key: SPARK-33035
> URL: https://issues.apache.org/jira/browse/SPARK-33035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Major
>
> This ticket aims at fixing corner-case bugs in the 
> `QueryPlan#transformUpWithNewOutput` that is used to propagate updated 
> `ExprId`s in a bottom-up way. Let's say we have a rule to simply assign new 
> `ExprId`s in a projection list like this;
> {code}
> case class TestRule extends Rule[LogicalPlan] {
>   override def apply(plan: LogicalPlan): LogicalPlan = 
> plan.transformUpWithNewOutput {
> case p @ Project(projList, _) =>
>   val newPlan = p.copy(projectList = projList.map { _.transform {
> // Assigns a new `ExprId` for references
> case a: AttributeReference => Alias(a, a.name)()
>   }}.asInstanceOf[Seq[NamedExpression]])
>   val attrMapping = p.output.zip(newPlan.output)
>   newPlan -> attrMapping
>   }
> }
> {code}
> Then, this rule is applied into a plan below;
> {code}
> (3) Project [a#5, b#6]
> +- (2) Project [a#5, b#6]
>+- (1) Project [a#5, b#6]
>   +- LocalRelation , [a#5, b#6]
> {code}
> In the first transformation, the rule assigns new `ExprId`s in `(1) Project` 
> (e.g., a#5 AS a#7, b#6 AS b#8). In the second transformation, the rule 
> corrects the input references of `(2) Project`  first by using attribute 
> mapping given from `(1) Project` (a#5->a#7 and b#6->b#8) and then assigns new 
> `ExprId`s (e.g., a#7 AS a#9, b#8 AS b#10). But, in the third transformation, 
> the rule fails because it tries to correct the references of `(3) Project` by 
> using incorrect attribute mapping (a#7->a#9 and b#8->b#10) even though the 
> correct one is a#5->a#9 and b#6->b#10. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33034) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (Oracle dialect)

2020-09-30 Thread Maxim Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-33034:
---
Summary: Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
nullability of columns (Oracle dialect)  (was: Support ALTER TABLE in JDBC v2 
Table Catalog: add, update type and nullability of columns)

> Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
> nullability of columns (Oracle dialect)
> --
>
> Key: SPARK-33034
> URL: https://issues.apache.org/jira/browse/SPARK-33034
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Override the default SQL strings for:
> - ALTER TABLE ADD COLUMN
> - ALTER TABLE UPDATE COLUMN TYPE
> - ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following Oracle JDBC dialect according to official documentation.
> Write Oracle integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33034) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (Oracle dialect)

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33034:


Assignee: (was: Apache Spark)

> Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
> nullability of columns (Oracle dialect)
> --
>
> Key: SPARK-33034
> URL: https://issues.apache.org/jira/browse/SPARK-33034
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Override the default SQL strings for:
> - ALTER TABLE ADD COLUMN
> - ALTER TABLE UPDATE COLUMN TYPE
> - ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following Oracle JDBC dialect according to official documentation.
> Write Oracle integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33034) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (Oracle dialect)

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204695#comment-17204695
 ] 

Apache Spark commented on SPARK-33034:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/29912

> Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
> nullability of columns (Oracle dialect)
> --
>
> Key: SPARK-33034
> URL: https://issues.apache.org/jira/browse/SPARK-33034
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Override the default SQL strings for:
> - ALTER TABLE ADD COLUMN
> - ALTER TABLE UPDATE COLUMN TYPE
> - ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following Oracle JDBC dialect according to official documentation.
> Write Oracle integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33034) Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and nullability of columns (Oracle dialect)

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33034:


Assignee: Apache Spark

> Support ALTER TABLE in JDBC v2 Table Catalog: add, update type and 
> nullability of columns (Oracle dialect)
> --
>
> Key: SPARK-33034
> URL: https://issues.apache.org/jira/browse/SPARK-33034
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Override the default SQL strings for:
> - ALTER TABLE ADD COLUMN
> - ALTER TABLE UPDATE COLUMN TYPE
> - ALTER TABLE UPDATE COLUMN NULLABILITY
> in the following Oracle JDBC dialect according to official documentation.
> Write Oracle integration tests for JDBC.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32741) Check if the same ExprId refers to the unique attribute in logical plans

2020-09-30 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-32741.
--
Fix Version/s: 3.1.0
 Assignee: Takeshi Yamamuro
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/29585

> Check if the same ExprId refers to the unique attribute in logical plans
> 
>
> Key: SPARK-32741
> URL: https://issues.apache.org/jira/browse/SPARK-32741
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 3.1.0
>
>
> Some plan transformations (e.g., `RemoveNoopOperators`) implicitly assume the 
> same `ExprId` refers to the unique attribute. But, `RuleExecutor` does not 
> check this integrity between logical plan transformations. So, this ticket 
> targets at adding this check in `isPlanIntegral` of `Analyzer`/`Optimizer`.
> This PR comes from the talk with @cloud-fan @viirya in 
> https://github.com/apache/spark/pull/29485#discussion_r475346278



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33036) Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a bottom-up manner

2020-09-30 Thread Takeshi Yamamuro (Jira)

Takeshi Yamamuro created SPARK-33036:


 Summary: Refactor RewriteCorrelatedScalarSubquery code to replace 
exprIds in a bottom-up manner
 Key: SPARK-33036
 URL: https://issues.apache.org/jira/browse/SPARK-33036
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Takeshi Yamamuro


This PR aims at refactoring code in `RewriteCorrelatedScalarSubquery` for 
replacing `ExprId`s in a bottom-up manner instead of doing in a top-down one.

This PR comes from the talk with @cloud-fan in 
https://github.com/apache/spark/pull/29585#discussion_r490371252.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33036) Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a bottom-up manner

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33036:


Assignee: Apache Spark

> Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a 
> bottom-up manner
> --
>
> Key: SPARK-33036
> URL: https://issues.apache.org/jira/browse/SPARK-33036
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Assignee: Apache Spark
>Priority: Minor
>
> This PR aims at refactoring code in `RewriteCorrelatedScalarSubquery` for 
> replacing `ExprId`s in a bottom-up manner instead of doing in a top-down one.
> This PR comes from the talk with @cloud-fan in 
> https://github.com/apache/spark/pull/29585#discussion_r490371252.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33036) Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a bottom-up manner

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204726#comment-17204726
 ] 

Apache Spark commented on SPARK-33036:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/29913

> Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a 
> bottom-up manner
> --
>
> Key: SPARK-33036
> URL: https://issues.apache.org/jira/browse/SPARK-33036
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> This PR aims at refactoring code in `RewriteCorrelatedScalarSubquery` for 
> replacing `ExprId`s in a bottom-up manner instead of doing in a top-down one.
> This PR comes from the talk with @cloud-fan in 
> https://github.com/apache/spark/pull/29585#discussion_r490371252.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33036) Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a bottom-up manner

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33036:


Assignee: (was: Apache Spark)

> Refactor RewriteCorrelatedScalarSubquery code to replace exprIds in a 
> bottom-up manner
> --
>
> Key: SPARK-33036
> URL: https://issues.apache.org/jira/browse/SPARK-33036
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Takeshi Yamamuro
>Priority: Minor
>
> This PR aims at refactoring code in `RewriteCorrelatedScalarSubquery` for 
> replacing `ExprId`s in a bottom-up manner instead of doing in a top-down one.
> This PR comes from the talk with @cloud-fan in 
> https://github.com/apache/spark/pull/29585#discussion_r490371252.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32989) Performance regression when selecting from str_to_map

2020-09-30 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204740#comment-17204740
 ] 

Yang Jie commented on SPARK-32989:
--

[~hyukjin.kwon] master also has this problem, may be related to SPARK-30356. 
After revert it, the performance return normal

> Performance regression when selecting from str_to_map
> -
>
> Key: SPARK-32989
> URL: https://issues.apache.org/jira/browse/SPARK-32989
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Ondrej Kokes
>Priority: Minor
>
> When I create a map using str_to_map and select more than a single value, I 
> notice a notable performance regression in 3.0.1 compared to 2.4.7. When 
> selecting a single value, the performance is the same. Plans are identical 
> between versions.
> It seems like in 2.x the map from str_to_map is preserved for a given row, 
> but in 3.x it's recalculated for each column. One hint that it might be the 
> case is that when I tried forcing materialisation of said map in 3.x (by a 
> coalesce, don't know if there's a better way), I got the performance roughly 
> to 2.x levels.
> Here's a reproducer (the csv in question gets autogenerated by the python 
> code):
> {code:java}
> $ head regression.csv 
> foo
> foo=bar&baz=bak&bar=foo
> foo=bar&baz=bak&bar=foo
> foo=bar&baz=bak&bar=foo
> foo=bar&baz=bak&bar=foo
> foo=bar&baz=bak&bar=foo
> ... (10M more rows)
> {code}
> {code:python}
> import time
> import os
> import pyspark  
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as f
> if __name__ == '__main__':
> print(pyspark.__version__)
> spark = SparkSession.builder.getOrCreate()
> filename = 'regression.csv'
> if not os.path.isfile(filename):
> with open(filename, 'wt') as fw:
> fw.write('foo\n')
> for _ in range(10_000_000):
> fw.write('foo=bar&baz=bak&bar=foo\n')
> df = spark.read.option('header', True).csv(filename)
> t = time.time()
> dd = (df
> .withColumn('my_map', f.expr('str_to_map(foo, "&", "=")'))
> .select(
> f.col('my_map')['foo'],
> )
> )
> dd.write.mode('overwrite').csv('tmp')
> t2 = time.time()
> print('selected one', t2 - t)
> dd = (df
> .withColumn('my_map', f.expr('str_to_map(foo, "&", "=")'))
> # .coalesce(100) # forcing evaluation before selection speeds it 
> up in 3.0.1
> .select(
> f.col('my_map')['foo'],
> f.col('my_map')['bar'],
> f.col('my_map')['baz'],
> )
> )
> dd.explain(True)
> dd.write.mode('overwrite').csv('tmp')
> t3 = time.time()
> print('selected three', t3 - t2)
> {code}
> Results for 2.4.7 and 3.0.1, both installed from PyPI, Python 3.7, macOS 
> (times are in seconds)
> {code:java}
> # 3.0.1
> # selected one 6.375471830368042  
> 
> # selected three 14.847578048706055
> # 2.4.7
> # selected one 6.679579019546509  
> 
> # selected three 6.5622029304504395  
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33006) Add dynamic PVC usage example into K8s doc

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33006.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29897
[https://github.com/apache/spark/pull/29897]

> Add dynamic PVC usage example into K8s doc
> --
>
> Key: SPARK-33006
> URL: https://issues.apache.org/jira/browse/SPARK-33006
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33037) Add "spark.shuffle.manager" value to knownManagers

2020-09-30 Thread BoYang (Jira)

BoYang created SPARK-33037:
--

 Summary: Add "spark.shuffle.manager" value to knownManagers
 Key: SPARK-33037
 URL: https://issues.apache.org/jira/browse/SPARK-33037
 Project: Spark
  Issue Type: Improvement
  Components: Shuffle
Affects Versions: 3.0.1, 2.4.7
Reporter: BoYang


Spark has a hardcode list to contain known shuffle managers, which has two 
values now. It does not contain user's custom shuffle manager which is set 
through Spark config "spark.shuffle.manager".

 

We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
plugin (Uber Remote Shuffle Service implementation, 
https://github.com/uber/RemoteShuffleService).

 

Need to add "spark.shuffle.manager" config value to the known managers list as 
well.

 

The know managers list is in code:

common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
{quote}private final List knownManagers = Arrays.asList(
  "org.apache.spark.shuffle.sort.SortShuffleManager",
  "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
{quote}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33037) Add "spark.shuffle.manager" value to knownManagers

2020-09-30 Thread BoYang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BoYang updated SPARK-33037:
---
Description: 
Spark has a hardcode list to contain known shuffle managers, which has two 
values now. It does not contain user's custom shuffle manager which is set 
through Spark config "spark.shuffle.manager".

 

We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
plugin (Uber Remote Shuffle Service implementation, 
[https://github.com/uber/RemoteShuffleService]). Other users will hit same 
issue when they implement their own shuffle manager.

 

Need to add "spark.shuffle.manager" config value to the known managers list as 
well.

 

The know managers list is in code:

common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
{quote}private final List knownManagers = Arrays.asList(
   "org.apache.spark.shuffle.sort.SortShuffleManager",
   "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
{quote}
 

 

  was:
Spark has a hardcode list to contain known shuffle managers, which has two 
values now. It does not contain user's custom shuffle manager which is set 
through Spark config "spark.shuffle.manager".

 

We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
plugin (Uber Remote Shuffle Service implementation, 
https://github.com/uber/RemoteShuffleService).

 

Need to add "spark.shuffle.manager" config value to the known managers list as 
well.

 

The know managers list is in code:

common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
{quote}private final List knownManagers = Arrays.asList(
  "org.apache.spark.shuffle.sort.SortShuffleManager",
  "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
{quote}
 

 


> Add "spark.shuffle.manager" value to knownManagers
> --
>
> Key: SPARK-33037
> URL: https://issues.apache.org/jira/browse/SPARK-33037
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.1
>Reporter: BoYang
>Priority: Major
>
> Spark has a hardcode list to contain known shuffle managers, which has two 
> values now. It does not contain user's custom shuffle manager which is set 
> through Spark config "spark.shuffle.manager".
>  
> We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
> plugin (Uber Remote Shuffle Service implementation, 
> [https://github.com/uber/RemoteShuffleService]). Other users will hit same 
> issue when they implement their own shuffle manager.
>  
> Need to add "spark.shuffle.manager" config value to the known managers list 
> as well.
>  
> The know managers list is in code:
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
> {quote}private final List knownManagers = Arrays.asList(
>    "org.apache.spark.shuffle.sort.SortShuffleManager",
>    "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32996) Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204928#comment-17204928
 ] 

Apache Spark commented on SPARK-32996:
--

User 'shrutig' has created a pull request for this issue:
https://github.com/apache/spark/pull/29914

> Handle Option.empty v1.ExecutorSummary#peakMemoryMetrics
> 
>
> Key: SPARK-32996
> URL: https://issues.apache.org/jira/browse/SPARK-32996
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.0.1, 3.1.0
>Reporter: Shruti Gumma
>Assignee: Shruti Gumma
>Priority: Major
> Fix For: 3.1.0
>
>
> When {{peakMemoryMetrics}} in {{ExecutorSummary}} is {{Option.empty}}, then 
> the {{ExecutorMetricsJsonSerializer#serialize}} method does not execute the 
> {{jsonGenerator.writeObject}} method. This causes the json to be generated 
> with {{peakMemoryMetrics}} key added to the serialized string, but no 
> corresponding value.
> This causes an error to be thrown when it is the next key {{attributes}} turn 
> to be added to the json:
> {{com.fasterxml.jackson.core.JsonGenerationException: Can not write a field 
> name, expecting a value.}}
> {{}}
> {{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33038) AQE plan string should only display one plan when the initial and the current plan are the same

2020-09-30 Thread Allison Wang (Jira)

Allison Wang created SPARK-33038:


 Summary: AQE plan string should only display one plan when the 
initial and the current plan are the same
 Key: SPARK-33038
 URL: https://issues.apache.org/jira/browse/SPARK-33038
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Allison Wang


Currently, the AQE plan string displays both the initial plan and the current 
or the final plan. This can be redundant when the initial plan and the current 
physical plan are exactly the same. For instance, the `EXPLAIN` command will 
not actually execute the query, and thus the plan string will never change, but 
currently, the plan string still shows both the current and the initial plan:

 
{code:java}
AdaptiveSparkPlan (8)
+- == Current Plan ==
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1)
+- == Initial Plan ==
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1)
{code}
When the initial and the current plan are the same, there should be only one 
plan string displayed. For example
{code:java}
AdaptiveSparkPlan (8)
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1){code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33039) Misleading watermark calculation in structure streaming

2020-09-30 Thread Sandish Kumar HN (Jira)

Sandish Kumar HN created SPARK-33039:


 Summary: Misleading watermark calculation in structure streaming
 Key: SPARK-33039
 URL: https://issues.apache.org/jira/browse/SPARK-33039
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.4.4
Reporter: Sandish Kumar HN


source code:
{code:java}

import org.apache.spark.sql.SparkSession
import org.apache.hadoop.fs.Path
import java.sql.Timestamp

import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.streaming.{ProcessingTime, Trigger}

object TestWaterMark extends App {


 val spark = SparkSession.builder().master("local").getOrCreate()
 val sc = spark.sparkContext
 val dir = new Path("/tmp/test-structured-streaming")
 val fs = dir.getFileSystem(sc.hadoopConfiguration)
 fs.mkdirs(dir)

 val schema = StructType(StructField("vilue", StringType) ::
 StructField("timestamp", TimestampType) ::
 Nil)

 val eventStream = spark
 .readStream
 .option("sep", ";")
 .option("header", "false")
 .schema(schema)
 .csv(dir.toString)

 // Watermarked aggregation
 val eventsCount = eventStream
 .withWatermark("timestamp", "5 seconds")
 .groupBy(window(col("timestamp"), "10 seconds"))
 .count

 def writeFile(path: Path, data: String) {
 val file = fs.create(path)
 file.writeUTF(data)
 file.close()
 }

 // Debug query
 val query = eventsCount.writeStream
 .format("console")
 .outputMode("complete")
 .option("truncate", "false")
 .trigger(Trigger.ProcessingTime("5 seconds"))
 .start()

 writeFile(new Path(dir, "file1"), """
 |OLD;2019-08-09 10:05:00
 |OLD;2019-08-09 10:10:00
 |OLD;2019-08-09 10:15:00""".stripMargin)

 query.processAllAvailable()
 val lp1 = query.lastProgress
 println(lp1.eventTime)


 writeFile(new Path(dir, "file2"), """
 |NEW;2020-08-29 10:05:00
 |NEW;2020-08-29 10:10:00
 |NEW;2020-08-29 10:15:00""".stripMargin)

 query.processAllAvailable()
 val lp2 = query.lastProgress
 println(lp2.eventTime)

 writeFile(new Path(dir, "file4"), """
 |OLD;2017-08-10 10:05:00
 |OLD;2017-08-10 10:10:00
 |OLD;2017-08-10 10:15:00""".stripMargin)
 writeFile(new Path(dir, "file3"), "")

 query.processAllAvailable()
 val lp3 = query.lastProgress
 println(lp3.eventTime)


 query.awaitTermination()
 fs.delete(dir, true)

}

{code}

OUTPUT:

 
{code:java}
---
Batch: 0
---
+--+-+
|window |count|
+--+-+
|[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 |
|[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 |
|[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 |
+--+-+
{min=2019-08-09T17:05:00.000Z, avg=2019-08-09T17:10:00.000Z, 
watermark=1970-01-01T00:00:00.000Z, max=2019-08-09T17:15:00.000Z}
---
Batch: 1
---
+--+-+
|window |count|
+--+-+
|[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 |
|[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 |
|[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 |
|[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 |
|[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 |
|[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 |
+--+-+
{min=2020-08-29T17:05:00.000Z, avg=2020-08-29T17:10:00.000Z, 
watermark=2019-08-09T17:14:55.000Z, max=2020-08-29T17:15:00.000Z}
---
Batch: 2
---
+--+-+
|window |count|
+--+-+
|[2017-08-10 10:15:00, 2017-08-10 10:15:10]|1 |
|[2020-08-29 10:15:00, 2020-08-29 10:15:10]|1 |
|[2017-08-10 10:05:00, 2017-08-10 10:05:10]|1 |
|[2020-08-29 10:10:00, 2020-08-29 10:10:10]|1 |
|[2019-08-09 10:05:00, 2019-08-09 10:05:10]|1 |
|[2017-08-10 10:10:00, 2017-08-10 10:10:10]|1 |
|[2020-08-29 10:05:00, 2020-08-29 10:05:10]|1 |
|[2019-08-09 10:15:00, 2019-08-09 10:15:10]|1 |
|[2019-08-09 10:10:00, 2019-08-09 10:10:10]|1 |
+--+-+
{min=2017-08-10T17:05:00.000Z, avg=2017-08-10T17:10:00.000Z, 
watermark=2020-08-29T17:14:55.000Z, max=2017-08-10T17:15:00.000Z}
{code}

EXPECTED:
expected to drop the last batch events to get dropped as the watermark is 
2019-08-09T17:14:55.000Z.

expected events to get droped:
 |OLD;2017-08-10 10:05:00
 |OLD;2017-08-10 10:10:00
 |OLD;2017-08-10 10:15:00



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33038) AQE plan string should only display one plan when the initial and the current plan are the same

2020-09-30 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-33038:
-
Description: 
Currently, the AQE plan string displays both the initial plan and the current 
or the final plan. This can be redundant when the initial plan and the current 
physical plan are exactly the same. For instance, the `EXPLAIN` command will 
not actually execute the query, and thus the plan string will never change, but 
currently, the plan string still shows both the current and the initial plan:

 
{code:java}
AdaptiveSparkPlan (8)
+- == Current Plan ==
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1)
+- == Initial Plan ==
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1)
{code}
When the initial and the current plan are the same, there should be only one 
plan string displayed. For example
{code:java}
AdaptiveSparkPlan (8)
+- Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1){code}
 

  was:
Currently, the AQE plan string displays both the initial plan and the current 
or the final plan. This can be redundant when the initial plan and the current 
physical plan are exactly the same. For instance, the `EXPLAIN` command will 
not actually execute the query, and thus the plan string will never change, but 
currently, the plan string still shows both the current and the initial plan:

 
{code:java}
AdaptiveSparkPlan (8)
+- == Current Plan ==
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1)
+- == Initial Plan ==
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1)
{code}
When the initial and the current plan are the same, there should be only one 
plan string displayed. For example
{code:java}
AdaptiveSparkPlan (8)
   Sort (7)
   +- Exchange (6)
  +- HashAggregate (5)
 +- Exchange (4)
+- HashAggregate (3)
   +- Filter (2)
  +- Scan parquet default.explain_temp1 (1){code}
 


> AQE plan string should only display one plan when the initial and the current 
> plan are the same
> ---
>
> Key: SPARK-33038
> URL: https://issues.apache.org/jira/browse/SPARK-33038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Allison Wang
>Priority: Minor
>
> Currently, the AQE plan string displays both the initial plan and the current 
> or the final plan. This can be redundant when the initial plan and the 
> current physical plan are exactly the same. For instance, the `EXPLAIN` 
> command will not actually execute the query, and thus the plan string will 
> never change, but currently, the plan string still shows both the current and 
> the initial plan:
>  
> {code:java}
> AdaptiveSparkPlan (8)
> +- == Current Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> +- == Initial Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> {code}
> When the initial and the current plan are the same, there should be only one 
> plan string displayed. For example
> {code:java}
> AdaptiveSparkPlan (8)
> +- Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33038) AQE plan string should only display one plan when the initial and the current plan are the same

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33038:


Assignee: (was: Apache Spark)

> AQE plan string should only display one plan when the initial and the current 
> plan are the same
> ---
>
> Key: SPARK-33038
> URL: https://issues.apache.org/jira/browse/SPARK-33038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Allison Wang
>Priority: Minor
>
> Currently, the AQE plan string displays both the initial plan and the current 
> or the final plan. This can be redundant when the initial plan and the 
> current physical plan are exactly the same. For instance, the `EXPLAIN` 
> command will not actually execute the query, and thus the plan string will 
> never change, but currently, the plan string still shows both the current and 
> the initial plan:
>  
> {code:java}
> AdaptiveSparkPlan (8)
> +- == Current Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> +- == Initial Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> {code}
> When the initial and the current plan are the same, there should be only one 
> plan string displayed. For example
> {code:java}
> AdaptiveSparkPlan (8)
> +- Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33038) AQE plan string should only display one plan when the initial and the current plan are the same

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204958#comment-17204958
 ] 

Apache Spark commented on SPARK-33038:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/29915

> AQE plan string should only display one plan when the initial and the current 
> plan are the same
> ---
>
> Key: SPARK-33038
> URL: https://issues.apache.org/jira/browse/SPARK-33038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, the AQE plan string displays both the initial plan and the current 
> or the final plan. This can be redundant when the initial plan and the 
> current physical plan are exactly the same. For instance, the `EXPLAIN` 
> command will not actually execute the query, and thus the plan string will 
> never change, but currently, the plan string still shows both the current and 
> the initial plan:
>  
> {code:java}
> AdaptiveSparkPlan (8)
> +- == Current Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> +- == Initial Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> {code}
> When the initial and the current plan are the same, there should be only one 
> plan string displayed. For example
> {code:java}
> AdaptiveSparkPlan (8)
> +- Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33038) AQE plan string should only display one plan when the initial and the current plan are the same

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33038:


Assignee: Apache Spark

> AQE plan string should only display one plan when the initial and the current 
> plan are the same
> ---
>
> Key: SPARK-33038
> URL: https://issues.apache.org/jira/browse/SPARK-33038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, the AQE plan string displays both the initial plan and the current 
> or the final plan. This can be redundant when the initial plan and the 
> current physical plan are exactly the same. For instance, the `EXPLAIN` 
> command will not actually execute the query, and thus the plan string will 
> never change, but currently, the plan string still shows both the current and 
> the initial plan:
>  
> {code:java}
> AdaptiveSparkPlan (8)
> +- == Current Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> +- == Initial Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> {code}
> When the initial and the current plan are the same, there should be only one 
> plan string displayed. For example
> {code:java}
> AdaptiveSparkPlan (8)
> +- Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33038) AQE plan string should only display one plan when the initial and the current plan are the same

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204957#comment-17204957
 ] 

Apache Spark commented on SPARK-33038:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/29915

> AQE plan string should only display one plan when the initial and the current 
> plan are the same
> ---
>
> Key: SPARK-33038
> URL: https://issues.apache.org/jira/browse/SPARK-33038
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Allison Wang
>Priority: Minor
>
> Currently, the AQE plan string displays both the initial plan and the current 
> or the final plan. This can be redundant when the initial plan and the 
> current physical plan are exactly the same. For instance, the `EXPLAIN` 
> command will not actually execute the query, and thus the plan string will 
> never change, but currently, the plan string still shows both the current and 
> the initial plan:
>  
> {code:java}
> AdaptiveSparkPlan (8)
> +- == Current Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> +- == Initial Plan ==
>Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1)
> {code}
> When the initial and the current plan are the same, there should be only one 
> plan string displayed. For example
> {code:java}
> AdaptiveSparkPlan (8)
> +- Sort (7)
>+- Exchange (6)
>   +- HashAggregate (5)
>  +- Exchange (4)
> +- HashAggregate (3)
>+- Filter (2)
>   +- Scan parquet default.explain_temp1 (1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33037) Add "spark.shuffle.manager" value to knownManagers

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33037:


Assignee: Apache Spark

> Add "spark.shuffle.manager" value to knownManagers
> --
>
> Key: SPARK-33037
> URL: https://issues.apache.org/jira/browse/SPARK-33037
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.1
>Reporter: BoYang
>Assignee: Apache Spark
>Priority: Major
>
> Spark has a hardcode list to contain known shuffle managers, which has two 
> values now. It does not contain user's custom shuffle manager which is set 
> through Spark config "spark.shuffle.manager".
>  
> We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
> plugin (Uber Remote Shuffle Service implementation, 
> [https://github.com/uber/RemoteShuffleService]). Other users will hit same 
> issue when they implement their own shuffle manager.
>  
> Need to add "spark.shuffle.manager" config value to the known managers list 
> as well.
>  
> The know managers list is in code:
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
> {quote}private final List knownManagers = Arrays.asList(
>    "org.apache.spark.shuffle.sort.SortShuffleManager",
>    "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33037) Add "spark.shuffle.manager" value to knownManagers

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33037:


Assignee: (was: Apache Spark)

> Add "spark.shuffle.manager" value to knownManagers
> --
>
> Key: SPARK-33037
> URL: https://issues.apache.org/jira/browse/SPARK-33037
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.1
>Reporter: BoYang
>Priority: Major
>
> Spark has a hardcode list to contain known shuffle managers, which has two 
> values now. It does not contain user's custom shuffle manager which is set 
> through Spark config "spark.shuffle.manager".
>  
> We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
> plugin (Uber Remote Shuffle Service implementation, 
> [https://github.com/uber/RemoteShuffleService]). Other users will hit same 
> issue when they implement their own shuffle manager.
>  
> Need to add "spark.shuffle.manager" config value to the known managers list 
> as well.
>  
> The know managers list is in code:
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
> {quote}private final List knownManagers = Arrays.asList(
>    "org.apache.spark.shuffle.sort.SortShuffleManager",
>    "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33037) Add "spark.shuffle.manager" value to knownManagers

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204965#comment-17204965
 ] 

Apache Spark commented on SPARK-33037:
--

User 'boy-uber' has created a pull request for this issue:
https://github.com/apache/spark/pull/29916

> Add "spark.shuffle.manager" value to knownManagers
> --
>
> Key: SPARK-33037
> URL: https://issues.apache.org/jira/browse/SPARK-33037
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.7, 3.0.1
>Reporter: BoYang
>Priority: Major
>
> Spark has a hardcode list to contain known shuffle managers, which has two 
> values now. It does not contain user's custom shuffle manager which is set 
> through Spark config "spark.shuffle.manager".
>  
> We hit issue when set "spark.shuffle.manager" with our own shuffle manager 
> plugin (Uber Remote Shuffle Service implementation, 
> [https://github.com/uber/RemoteShuffleService]). Other users will hit same 
> issue when they implement their own shuffle manager.
>  
> Need to add "spark.shuffle.manager" config value to the known managers list 
> as well.
>  
> The know managers list is in code:
> common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockResolver.java
> {quote}private final List knownManagers = Arrays.asList(
>    "org.apache.spark.shuffle.sort.SortShuffleManager",
>    "org.apache.spark.shuffle.unsafe.UnsafeShuffleManager");
> {quote}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33040) Add SparkR API for invoking vector_to_array

2020-09-30 Thread Maciej Szymkiewicz (Jira)

Maciej Szymkiewicz created SPARK-33040:
--

 Summary: Add SparkR API for invoking vector_to_array
 Key: SPARK-33040
 URL: https://issues.apache.org/jira/browse/SPARK-33040
 Project: Spark
  Issue Type: Improvement
  Components: ML, SparkR
Affects Versions: 3.0.0, 3.1.0
Reporter: Maciej Szymkiewicz


SPARK-30154 introduced Scala UDF with Python API, that allows us to convert 
from ML / MLlib vectors to array types.

It would be very useful (arguably more useful than in Python, where we have 
more user-friendly UDF and Vector API) to have R wrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33040) Add SparkR API for invoking vector_to_array

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33040:


Assignee: Apache Spark

> Add SparkR API for invoking vector_to_array
> ---
>
> Key: SPARK-33040
> URL: https://issues.apache.org/jira/browse/SPARK-33040
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-30154 introduced Scala UDF with Python API, that allows us to convert 
> from ML / MLlib vectors to array types.
> It would be very useful (arguably more useful than in Python, where we have 
> more user-friendly UDF and Vector API) to have R wrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33040) Add SparkR API for invoking vector_to_array

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17204995#comment-17204995
 ] 

Apache Spark commented on SPARK-33040:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/29917

> Add SparkR API for invoking vector_to_array
> ---
>
> Key: SPARK-33040
> URL: https://issues.apache.org/jira/browse/SPARK-33040
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> SPARK-30154 introduced Scala UDF with Python API, that allows us to convert 
> from ML / MLlib vectors to array types.
> It would be very useful (arguably more useful than in Python, where we have 
> more user-friendly UDF and Vector API) to have R wrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33040) Add SparkR API for invoking vector_to_array

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33040:


Assignee: (was: Apache Spark)

> Add SparkR API for invoking vector_to_array
> ---
>
> Key: SPARK-33040
> URL: https://issues.apache.org/jira/browse/SPARK-33040
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> SPARK-30154 introduced Scala UDF with Python API, that allows us to convert 
> from ML / MLlib vectors to array types.
> It would be very useful (arguably more useful than in Python, where we have 
> more user-friendly UDF and Vector API) to have R wrapper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33041) Better error messages when PySpark Java Gateway Crashes

2020-09-30 Thread Russell Spitzer (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Spitzer updated SPARK-33041:

Summary: Better error messages when PySpark Java Gateway Crashes  (was: 
Better error messages when PySpark Java Gateway Fails to Start or Crashes)

> Better error messages when PySpark Java Gateway Crashes
> ---
>
> Key: SPARK-33041
> URL: https://issues.apache.org/jira/browse/SPARK-33041
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.7
>Reporter: Russell Spitzer
>Priority: Major
>
> Currently the startup works by opening the Gateway process and waiting until 
> the the process has written the conn_info_file. Once the conn_file is written 
> it proceeds to attempt to connect to the port.
> This connection can succeed and the process can start normally, but if the 
> gateway process dies or is killed the error that the user ends up getting is 
> a confusing "connection_failed" style error like
> {code}
> Traceback (most recent call last):
>   File 
> "/usr/lib/spark-packages/spark2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>  line 929, in _get_connection
> connection = self.deque.pop()
> IndexError: pop from an empty deque
> {code}
> Since we have a handle on the py4j process, we should probably check whether 
> it has terminated before surfacing any exceptions like this. 
> CC [~holden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33041) Better error messages when PySpark Java Gateway Fails to Start or Crashes

2020-09-30 Thread Russell Spitzer (Jira)

Russell Spitzer created SPARK-33041:
---

 Summary: Better error messages when PySpark Java Gateway Fails to 
Start or Crashes
 Key: SPARK-33041
 URL: https://issues.apache.org/jira/browse/SPARK-33041
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 2.4.7
Reporter: Russell Spitzer


Currently the startup works by opening the Gateway process and waiting until 
the the process has written the conn_info_file. Once the conn_file is written 
it proceeds to attempt to connect to the port.

This connection can succeed and the process can start normally, but if the 
gateway process dies or is killed the error that the user ends up getting is a 
confusing "connection_failed" style error like

{code}
Traceback (most recent call last):
  File 
"/usr/lib/spark-packages/spark2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
 line 929, in _get_connection
connection = self.deque.pop()
IndexError: pop from an empty deque
{code}

Since we have a handle on the py4j process, we should probably check whether it 
has terminated before surfacing any exceptions like this. 

CC [~holden]




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33041) Better error messages when PySpark Java Gateway Crashes

2020-09-30 Thread Russell Spitzer (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Spitzer updated SPARK-33041:

Affects Version/s: 3.0.1

> Better error messages when PySpark Java Gateway Crashes
> ---
>
> Key: SPARK-33041
> URL: https://issues.apache.org/jira/browse/SPARK-33041
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> Currently the startup works by opening the Gateway process and waiting until 
> the the process has written the conn_info_file. Once the conn_file is written 
> it proceeds to attempt to connect to the port.
> This connection can succeed and the process can start normally, but if the 
> gateway process dies or is killed the error that the user ends up getting is 
> a confusing "connection_failed" style error like
> {code}
> Traceback (most recent call last):
>   File 
> "/usr/lib/spark-packages/spark2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>  line 929, in _get_connection
> connection = self.deque.pop()
> IndexError: pop from an empty deque
> {code}
> Since we have a handle on the py4j process, we should probably check whether 
> it has terminated before surfacing any exceptions like this. 
> CC [~holden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33041) Better error messages when PySpark Java Gateway Crashes

2020-09-30 Thread Russell Spitzer (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205059#comment-17205059
 ] 

Russell Spitzer commented on SPARK-33041:
-

To elaborate, this could be the case for any failure that occurs after the 
connection file is written. For example, say the OOM killer comes in and shuts 
down the gateway or there is some other fatal failure of the gateway. These 
cases would also result in the rather opaque messages about queues and 
networking, when we know the actual problem is that the gateway is shutdown.

> Better error messages when PySpark Java Gateway Crashes
> ---
>
> Key: SPARK-33041
> URL: https://issues.apache.org/jira/browse/SPARK-33041
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> Currently the startup works by opening the Gateway process and waiting until 
> the the process has written the conn_info_file. Once the conn_file is written 
> it proceeds to attempt to connect to the port.
> This connection can succeed and the process can start normally, but if the 
> gateway process dies or is killed the error that the user ends up getting is 
> a confusing "connection_failed" style error like
> {code}
> Traceback (most recent call last):
>   File 
> "/usr/lib/spark-packages/spark2.4.4/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py",
>  line 929, in _get_connection
> connection = self.deque.pop()
> IndexError: pop from an empty deque
> {code}
> Since we have a handle on the py4j process, we should probably check whether 
> it has terminated before surfacing any exceptions like this. 
> CC [~holden]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Yuning Zhang (Jira)

Yuning Zhang created SPARK-33042:


 Summary: Add a test case to ensure changes to 
spark.sql.optimizer.maxIterations take effect at runtime
 Key: SPARK-33042
 URL: https://issues.apache.org/jira/browse/SPARK-33042
 Project: Spark
  Issue Type: Test
  Components: Optimizer
Affects Versions: 3.1.0
Reporter: Yuning Zhang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Yuning Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuning Zhang updated SPARK-33042:
-
Description: 
**Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` take 
effect at runtime.

Currently, there is only one related test case: 
[https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]

However, this test case only checks the value of the conf can be changed at 
runtime. It does not check the updated value is actually used by the Optimizer.

> Add a test case to ensure changes to spark.sql.optimizer.maxIterations take 
> effect at runtime
> -
>
> Key: SPARK-33042
> URL: https://issues.apache.org/jira/browse/SPARK-33042
> Project: Spark
>  Issue Type: Test
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Yuning Zhang
>Priority: Major
>
> **Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` 
> take effect at runtime.
> Currently, there is only one related test case: 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]
> However, this test case only checks the value of the conf can be changed at 
> runtime. It does not check the updated value is actually used by the 
> Optimizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33017) PySpark Context should have getCheckpointDir() method

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33017:


Assignee: (was: Apache Spark)

> PySpark Context should have getCheckpointDir() method
> -
>
> Key: SPARK-33017
> URL: https://issues.apache.org/jira/browse/SPARK-33017
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> To match the Scala API, PySpark should offer a direct way to get the 
> checkpoint dir.
> {code:scala}
> scala> spark.sparkContext.setCheckpointDir("/tmp/spark/checkpoint")
> scala> spark.sparkContext.getCheckpointDir
> res3: Option[String] = 
> Some(file:/tmp/spark/checkpoint/34ebe699-bc83-4c5d-bfa2-50451296cf87)
> {code}
> Currently, the only was to do that from PySpark is via the underlying Java 
> context:
> {code:python}
> >>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
> >>> sc._jsc.sc().getCheckpointDir().get()
> 'file:/tmp/spark/checkpoint/ebf0fab5-edbc-42c2-938f-65d5e599cf54'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33017) PySpark Context should have getCheckpointDir() method

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33017:


Assignee: Apache Spark

> PySpark Context should have getCheckpointDir() method
> -
>
> Key: SPARK-33017
> URL: https://issues.apache.org/jira/browse/SPARK-33017
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Assignee: Apache Spark
>Priority: Minor
>
> To match the Scala API, PySpark should offer a direct way to get the 
> checkpoint dir.
> {code:scala}
> scala> spark.sparkContext.setCheckpointDir("/tmp/spark/checkpoint")
> scala> spark.sparkContext.getCheckpointDir
> res3: Option[String] = 
> Some(file:/tmp/spark/checkpoint/34ebe699-bc83-4c5d-bfa2-50451296cf87)
> {code}
> Currently, the only was to do that from PySpark is via the underlying Java 
> context:
> {code:python}
> >>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
> >>> sc._jsc.sc().getCheckpointDir().get()
> 'file:/tmp/spark/checkpoint/ebf0fab5-edbc-42c2-938f-65d5e599cf54'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33017) PySpark Context should have getCheckpointDir() method

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205148#comment-17205148
 ] 

Apache Spark commented on SPARK-33017:
--

User 'reidy-p' has created a pull request for this issue:
https://github.com/apache/spark/pull/29918

> PySpark Context should have getCheckpointDir() method
> -
>
> Key: SPARK-33017
> URL: https://issues.apache.org/jira/browse/SPARK-33017
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> To match the Scala API, PySpark should offer a direct way to get the 
> checkpoint dir.
> {code:scala}
> scala> spark.sparkContext.setCheckpointDir("/tmp/spark/checkpoint")
> scala> spark.sparkContext.getCheckpointDir
> res3: Option[String] = 
> Some(file:/tmp/spark/checkpoint/34ebe699-bc83-4c5d-bfa2-50451296cf87)
> {code}
> Currently, the only was to do that from PySpark is via the underlying Java 
> context:
> {code:python}
> >>> spark.sparkContext.setCheckpointDir('/tmp/spark/checkpoint/')
> >>> sc._jsc.sc().getCheckpointDir().get()
> 'file:/tmp/spark/checkpoint/ebf0fab5-edbc-42c2-938f-65d5e599cf54'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33042:


Assignee: (was: Apache Spark)

> Add a test case to ensure changes to spark.sql.optimizer.maxIterations take 
> effect at runtime
> -
>
> Key: SPARK-33042
> URL: https://issues.apache.org/jira/browse/SPARK-33042
> Project: Spark
>  Issue Type: Test
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Yuning Zhang
>Priority: Major
>
> **Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` 
> take effect at runtime.
> Currently, there is only one related test case: 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]
> However, this test case only checks the value of the conf can be changed at 
> runtime. It does not check the updated value is actually used by the 
> Optimizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33042:


Assignee: Apache Spark

> Add a test case to ensure changes to spark.sql.optimizer.maxIterations take 
> effect at runtime
> -
>
> Key: SPARK-33042
> URL: https://issues.apache.org/jira/browse/SPARK-33042
> Project: Spark
>  Issue Type: Test
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Yuning Zhang
>Assignee: Apache Spark
>Priority: Major
>
> **Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` 
> take effect at runtime.
> Currently, there is only one related test case: 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]
> However, this test case only checks the value of the conf can be changed at 
> runtime. It does not check the updated value is actually used by the 
> Optimizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205149#comment-17205149
 ] 

Apache Spark commented on SPARK-33042:
--

User 'yuningzh-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/29919

> Add a test case to ensure changes to spark.sql.optimizer.maxIterations take 
> effect at runtime
> -
>
> Key: SPARK-33042
> URL: https://issues.apache.org/jira/browse/SPARK-33042
> Project: Spark
>  Issue Type: Test
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Yuning Zhang
>Priority: Major
>
> **Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` 
> take effect at runtime.
> Currently, there is only one related test case: 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]
> However, this test case only checks the value of the conf can be changed at 
> runtime. It does not check the updated value is actually used by the 
> Optimizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205151#comment-17205151
 ] 

Apache Spark commented on SPARK-33042:
--

User 'yuningzh-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/29919

> Add a test case to ensure changes to spark.sql.optimizer.maxIterations take 
> effect at runtime
> -
>
> Key: SPARK-33042
> URL: https://issues.apache.org/jira/browse/SPARK-33042
> Project: Spark
>  Issue Type: Test
>  Components: Optimizer
>Affects Versions: 3.1.0
>Reporter: Yuning Zhang
>Priority: Major
>
> **Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` 
> take effect at runtime.
> Currently, there is only one related test case: 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]
> However, this test case only checks the value of the conf can be changed at 
> runtime. It does not check the updated value is actually used by the 
> Optimizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33043) RowMatrix is incompatible with spark.driver.maxResultSize=0

2020-09-30 Thread Karen Feng (Jira)

Karen Feng created SPARK-33043:
--

 Summary: RowMatrix is incompatible with 
spark.driver.maxResultSize=0
 Key: SPARK-33043
 URL: https://issues.apache.org/jira/browse/SPARK-33043
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 3.0.1, 3.0.0
Reporter: Karen Feng


RowMatrix does not work if spark.driver.maxResultSize=0, as this requirement 
breaks:

 
{code:java}
require(maxDriverResultSizeInBytes > aggregatedObjectSizeInBytes,  s"Cannot 
aggregate object of size $aggregatedObjectSizeInBytes Bytes, "+ s"as 
it's bigger than maxResultSize ($maxDriverResultSizeInBytes Bytes)")
{code}
 

[https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L795.]

 

This check should likely only happen if maxDriverResultSizeInBytes > 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33012) Upgrade fabric8 to 4.10.3

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33012:
--
Summary: Upgrade fabric8 to 4.10.3  (was: Upgrade fabric8 to 4.10.3 to 
support k8s 1.18.0)

> Upgrade fabric8 to 4.10.3
> -
>
> Key: SPARK-33012
> URL: https://issues.apache.org/jira/browse/SPARK-33012
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Jonathan Lafleche
>Assignee: Jonathan Lafleche
>Priority: Minor
> Fix For: 3.1.0
>
>
> According to [fabric8's compatibility 
> matrix|https://github.com/fabric8io/kubernetes-client#compatibility-matrix], 
> the current version (4.9.2) is not compatible with k8s 1.18.0.
> In practice, we have not encountered any issues running spark against k8s 
> 1.18.0, but it seems reasonable to track fabric8's declared compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33012) Upgrade fabric8 to 4.10.3 to support k8s 1.18.0

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33012.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29888
[https://github.com/apache/spark/pull/29888]

> Upgrade fabric8 to 4.10.3 to support k8s 1.18.0
> ---
>
> Key: SPARK-33012
> URL: https://issues.apache.org/jira/browse/SPARK-33012
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Jonathan Lafleche
>Assignee: Jonathan Lafleche
>Priority: Minor
> Fix For: 3.1.0
>
>
> According to [fabric8's compatibility 
> matrix|https://github.com/fabric8io/kubernetes-client#compatibility-matrix], 
> the current version (4.9.2) is not compatible with k8s 1.18.0.
> In practice, we have not encountered any issues running spark against k8s 
> 1.18.0, but it seems reasonable to track fabric8's declared compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33012) Upgrade fabric8 to 4.10.3 to support k8s 1.18.0

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33012:
-

Assignee: Jonathan Lafleche

> Upgrade fabric8 to 4.10.3 to support k8s 1.18.0
> ---
>
> Key: SPARK-33012
> URL: https://issues.apache.org/jira/browse/SPARK-33012
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Kubernetes
>Affects Versions: 3.1.0
>Reporter: Jonathan Lafleche
>Assignee: Jonathan Lafleche
>Priority: Minor
>
> According to [fabric8's compatibility 
> matrix|https://github.com/fabric8io/kubernetes-client#compatibility-matrix], 
> the current version (4.9.2) is not compatible with k8s 1.18.0.
> In practice, we have not encountered any issues running spark against k8s 
> 1.18.0, but it seems reasonable to track fabric8's declared compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33023) addJar use `Utils.isWindow` to judge windows system

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33023.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29909
[https://github.com/apache/spark/pull/29909]

> addJar use `Utils.isWindow` to judge windows system
> ---
>
> Key: SPARK-33023
> URL: https://issues.apache.org/jira/browse/SPARK-33023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> addJar use `Utils.isWindow` to judge windows system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33023) addJar use `Utils.isWindow` to judge windows system

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33023:
-

Assignee: angerszhu

> addJar use `Utils.isWindow` to judge windows system
> ---
>
> Key: SPARK-33023
> URL: https://issues.apache.org/jira/browse/SPARK-33023
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
>
> addJar use `Utils.isWindow` to judge windows system



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32723) Security Vulnerability due to JQuery version in Spark Master/Worker UI

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32723:
-

Assignee: Peter Toth

> Security Vulnerability due to JQuery version in Spark Master/Worker UI
> --
>
> Key: SPARK-32723
> URL: https://issues.apache.org/jira/browse/SPARK-32723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ashish Kumar Singh
>Assignee: Peter Toth
>Priority: Major
>  Labels: Security
>
> Spark 3.0, Spark 2.4.x uses JQuery version < 3.5 which has known security 
> vulnerability in Spark Master UI and Spark Worker UI.
> Can we please upgrade JQuery to 3.5 and above ?
>  [https://www.tenable.com/plugins/nessus/136929]
> ??According to the self-reported version in the script, the version of JQuery 
> hosted on the remote web server is greater than or equal to 1.2 and prior to 
> 3.5.0. It is, therefore, affected by multiple cross site scripting 
> vulnerabilities.??
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32723) Security Vulnerability due to JQuery version in Spark Master/Worker UI

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32723.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29902
[https://github.com/apache/spark/pull/29902]

> Security Vulnerability due to JQuery version in Spark Master/Worker UI
> --
>
> Key: SPARK-32723
> URL: https://issues.apache.org/jira/browse/SPARK-32723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ashish Kumar Singh
>Assignee: Peter Toth
>Priority: Major
>  Labels: Security
> Fix For: 3.1.0
>
>
> Spark 3.0, Spark 2.4.x uses JQuery version < 3.5 which has known security 
> vulnerability in Spark Master UI and Spark Worker UI.
> Can we please upgrade JQuery to 3.5 and above ?
>  [https://www.tenable.com/plugins/nessus/136929]
> ??According to the self-reported version in the script, the version of JQuery 
> hosted on the remote web server is greater than or equal to 1.2 and prior to 
> 3.5.0. It is, therefore, affected by multiple cross site scripting 
> vulnerabilities.??
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32723) Upgrade to jQuery 3.5.1

2020-09-30 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-32723:
--
Summary: Upgrade to jQuery 3.5.1  (was: Security Vulnerability due to 
JQuery version in Spark Master/Worker UI)

> Upgrade to jQuery 3.5.1
> ---
>
> Key: SPARK-32723
> URL: https://issues.apache.org/jira/browse/SPARK-32723
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ashish Kumar Singh
>Assignee: Peter Toth
>Priority: Major
>  Labels: Security
> Fix For: 3.1.0
>
>
> Spark 3.0, Spark 2.4.x uses JQuery version < 3.5 which has known security 
> vulnerability in Spark Master UI and Spark Worker UI.
> Can we please upgrade JQuery to 3.5 and above ?
>  [https://www.tenable.com/plugins/nessus/136929]
> ??According to the self-reported version in the script, the version of JQuery 
> hosted on the remote web server is greater than or equal to 1.2 and prior to 
> 3.5.0. It is, therefore, affected by multiple cross site scripting 
> vulnerabilities.??
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29330) Allow users to chose the name of Spark Shuffle service

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29330:


Assignee: (was: Apache Spark)

> Allow users to chose the name of Spark Shuffle service
> --
>
> Key: SPARK-29330
> URL: https://issues.apache.org/jira/browse/SPARK-29330
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.1.0
>Reporter: Alexander Bessonov
>Priority: Minor
>
> As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
> Shuffle Service.
> HDP distribution of Spark, on the other hand, uses 
> [{{spark2_shuffle}}|https://github.com/hortonworks/spark2-release/blob/HDP-3.1.0.0-78-tag/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L117].
>  This is done to be able to run both Spark 1.6 and Spark 2.x on the same 
> Hadoop cluster.
> Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
> favor) running becomes impossible due to the shuffle service name mismatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-29330) Allow users to chose the name of Spark Shuffle service

2020-09-30 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29330:


Assignee: Apache Spark

> Allow users to chose the name of Spark Shuffle service
> --
>
> Key: SPARK-29330
> URL: https://issues.apache.org/jira/browse/SPARK-29330
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, YARN
>Affects Versions: 3.1.0
>Reporter: Alexander Bessonov
>Assignee: Apache Spark
>Priority: Minor
>
> As of now, Spark uses hardcoded value {{spark_shuffle}} as the name of the 
> Shuffle Service.
> HDP distribution of Spark, on the other hand, uses 
> [{{spark2_shuffle}}|https://github.com/hortonworks/spark2-release/blob/HDP-3.1.0.0-78-tag/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala#L117].
>  This is done to be able to run both Spark 1.6 and Spark 2.x on the same 
> Hadoop cluster.
> Running vanilla Spark on HDP cluster with only Spark 2.x shuffle service (HDP 
> favor) running becomes impossible due to the shuffle service name mismatch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33044) Add a Jenkins build and test job for Scala 2.13

2020-09-30 Thread Yang Jie (Jira)

Yang Jie created SPARK-33044:


 Summary: Add a Jenkins build and test job for Scala 2.13
 Key: SPARK-33044
 URL: https://issues.apache.org/jira/browse/SPARK-33044
 Project: Spark
  Issue Type: Sub-task
  Components: jenkins
Affects Versions: 3.1.0
Reporter: Yang Jie


{{Master}} branch seems to be almost ready for Scala 2.13 now, we need a 
Jenkins test job to verify current work results and CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32992) In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst type

2020-09-30 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-32992.
--
Fix Version/s: 3.1.0
 Assignee: Maxim Gekk
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/29884

> In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst 
> type
> --
>
> Key: SPARK-32992
> URL: https://issues.apache.org/jira/browse/SPARK-32992
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.1.0
>Reporter: Peng Cheng
>Assignee: Maxim Gekk
>Priority: Minor
>  Labels: jdbc, jdbc_connector
> Fix For: 3.1.0
>
>
> Most JDBC drivers use long SQL type for dataset row ID:
>  
> (in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)
> {code:java}
> private def getCatalystType(
>  sqlType: Int,
>  precision: Int,
>  scale: Int,
>  signed: Boolean): DataType = {
>  val answer = sqlType match {
>  // scalastyle:off
>  ...
>  case java.sql.Types.ROWID => LongType
> ...
>  case _ =>
>  throw new SQLException("Unrecognized SQL type " + sqlType)
>  // scalastyle:on
>  }
> if (answer == null)
> { throw new SQLException("Unsupported type " + 
> JDBCType.valueOf(sqlType).getName) }
> answer
> {code}
>  
> Oracle JDBC drivers (of all versions) are rare exception, only String value 
> can be extracted:
>  
> (in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)
> {code:java}
> ...
> String getString(int var1) throws SQLException
> { return this.isNull(var1) ? null : 
> this.rowData.getString(this.getOffset(var1), this.getLength(var1), 
> this.statement.connection.conversion.getCharacterSet((short)1)); }
> Object getObject(int var1) throws SQLException
> { return this.getROWID(var1); }
> ...
> {code}
>  
> This caused an exception to be thrown when importing datasets from an Oracle 
> DB, as reported in 
> [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:]
> {code:java}
>  
>  {{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 
> 5.0 (TID 23, gbrdsr02985.intranet.barcapint.com, executor 21): 
> java.sql.SQLException: Invalid column type: getLong not implemented for class 
> oracle.jdbc.driver.T4CRowidAccessor at 
> oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
>  at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
>  at 
> oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}
>  
> {code}
>  
> Therefore, the default SQL type => Catalyst type conversion rule should be 
> overriden in OracleDialect. Specifically, the following rule should be added:
> {code:java}
> case Types.ROWID => Some(StringType)
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33044) Add a Jenkins build and test job for Scala 2.13

2020-09-30 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205289#comment-17205289
 ] 

Dongjoon Hyun commented on SPARK-33044:
---

Thank you, [~LuciferYang].

Could you add a new Jenkins job please, [~shaneknapp]?

> Add a Jenkins build and test job for Scala 2.13
> ---
>
> Key: SPARK-33044
> URL: https://issues.apache.org/jira/browse/SPARK-33044
> Project: Spark
>  Issue Type: Sub-task
>  Components: jenkins
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {{Master}} branch seems to be almost ready for Scala 2.13 now, we need a 
> Jenkins test job to verify current work results and CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33044) Add a Jenkins build and test job for Scala 2.13

2020-09-30 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17205290#comment-17205290
 ] 

Dongjoon Hyun commented on SPARK-33044:
---

cc [~srowen] and [~hyukjin.kwon]

> Add a Jenkins build and test job for Scala 2.13
> ---
>
> Key: SPARK-33044
> URL: https://issues.apache.org/jira/browse/SPARK-33044
> Project: Spark
>  Issue Type: Sub-task
>  Components: jenkins
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> {{Master}} branch seems to be almost ready for Scala 2.13 now, we need a 
> Jenkins test job to verify current work results and CI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33042) Add a test case to ensure changes to spark.sql.optimizer.maxIterations take effect at runtime

2020-09-30 Thread Takeshi Yamamuro (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro updated SPARK-33042:
-
Component/s: (was: Optimizer)
 SQL

> Add a test case to ensure changes to spark.sql.optimizer.maxIterations take 
> effect at runtime
> -
>
> Key: SPARK-33042
> URL: https://issues.apache.org/jira/browse/SPARK-33042
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuning Zhang
>Priority: Major
>
> **Add a test case to ensure changes to `spark.sql.optimizer.maxIterations` 
> take effect at runtime.
> Currently, there is only one related test case: 
> [https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/internal/SQLConfSuite.scala#L156]
> However, this test case only checks the value of the conf can be changed at 
> runtime. It does not check the updated value is actually used by the 
> Optimizer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

74 matches

Mail list logo