from:"Herman van Hovell \(JIRA\)"

[jira] [Commented] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2019-09-12 Thread Herman van Hovell (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928869#comment-16928869
 ] 

Herman van Hovell commented on SPARK-23945:
---

You can use a left semi join, e.g.:
{code}
SELECT *
FROM table1
   LEFT SEMI JOIN table2
ON table1.a = table2.a AND table1.b = table2.b
;
{code}

> Column.isin() should accept a single-column DataFrame as input
> --
>
> Key: SPARK-23945
> URL: https://issues.apache.org/jira/browse/SPARK-23945
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> In SQL you can filter rows based on the result of a subquery:
> {code:java}
> SELECT *
> FROM table1
> WHERE name NOT IN (
> SELECT name
> FROM table2
> );{code}
> In the Spark DataFrame API, the equivalent would probably look like this:
> {code:java}
> (table1
> .where(
> ~col('name').isin(
> table2.select('name')
> )
> )
> ){code}
> However, .isin() currently [only accepts a local list of 
> values|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.isin].
> I imagine making this enhancement would happen as part of a larger effort to 
> support correlated subqueries in the DataFrame API.
> Or perhaps there is no plan to support this style of query in the DataFrame 
> API, and queries like this should instead be written in a different way? How 
> would we write a query like the one I have above in the DataFrame API, 
> without needing to collect values locally for the NOT IN filter?
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28716) Add id to Exchange and Subquery's stringArgs method for easier identifying their reuses in query plans

2019-08-23 Thread Herman van Hovell (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28716.
---
Fix Version/s: 3.0.0
 Assignee: Ali Afroozeh
   Resolution: Fixed

> Add id to Exchange and Subquery's stringArgs method for easier identifying 
> their reuses in query plans
> --
>
> Key: SPARK-28716
> URL: https://issues.apache.org/jira/browse/SPARK-28716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ali Afroozeh
>Assignee: Ali Afroozeh
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add id to Exchange and Subquery's stringArgs method for easier identifying 
> their reuses in query plans, for example:
> {{ReusedExchange [d_date_sk#827|#827], BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))) 
> [[id=#2710|#2710]]}}
> Where {{2710}} is the id of the reused exchange.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28836) Remove the canonicalize(attributes) method from PlanExpression

2019-08-23 Thread Herman van Hovell (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28836.
---
Fix Version/s: 3.0.0
 Assignee: Ali Afroozeh
   Resolution: Fixed

> Remove the canonicalize(attributes) method from PlanExpression
> --
>
> Key: SPARK-28836
> URL: https://issues.apache.org/jira/browse/SPARK-28836
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ali Afroozeh
>Assignee: Ali Afroozeh
>Priority: Minor
> Fix For: 3.0.0
>
>
> The canonicalize(attrs: AttributeSeq) method in PlanExpression is somewhat 
> confusing. 
>  First, it is not clear why `PlanExpression` should have this method, and why 
> the canonicalization is not handled
>  by the canonicalized method of its parent, the Expression class. Second, the 
> QueryPlan.normalizeExpressionId
>  is the only place where PlanExpression.canonicalized is being called.
> This PR removes the canonicalize method from the PlanExpression class and 
> delegates the normalization of expression ids to
>  the QueryPlan.normalizedExpressionId method. Also, the name 
> normalizedExpressions is more suitable for this method,
>  therefore, the method has also been renamed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28715) Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan

2019-08-21 Thread Herman van Hovell (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-28715:
-

Assignee: Ali Afroozeh

> Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan
> ---
>
> Key: SPARK-28715
> URL: https://issues.apache.org/jira/browse/SPARK-28715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ali Afroozeh
>Assignee: Ali Afroozeh
>Priority: Minor
> Fix For: 3.0.0
>
>
> Introduces the {{collectInPlanAndSubqueries and subqueriesAll}} methods in 
> QueryPlan that consider all the plans in the query plan, including the ones 
> in nested subqueries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28715) Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan

2019-08-21 Thread Herman van Hovell (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28715.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

> Introduce collectInPlanAndSubqueries and subqueriesAll in QueryPlan
> ---
>
> Key: SPARK-28715
> URL: https://issues.apache.org/jira/browse/SPARK-28715
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ali Afroozeh
>Priority: Minor
> Fix For: 3.0.0
>
>
> Introduces the {{collectInPlanAndSubqueries and subqueriesAll}} methods in 
> QueryPlan that consider all the plans in the query plan, including the ones 
> in nested subqueries.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone database

2019-08-19 Thread Herman van Hovell (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-28775:
-

Assignee: Sean Owen  (was: Herman van Hovell)

> DateTimeUtilsSuite fails for JDKs using the tzdata2018i or newer timezone 
> database
> --
>
> Key: SPARK-28775
> URL: https://issues.apache.org/jira/browse/SPARK-28775
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Herman van Hovell
>Assignee: Sean Owen
>Priority: Major
>
> org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite 'daysToMillis and 
> millisToDays'  test case fails because of an update in the timezone library: 
> tzdata2018h. This retroactively changes a the value of a missing day for the 
> Kwalalein atol. See for more information: 
> https://bugs.openjdk.java.net/browse/JDK-8215981
> Let's fix this by excluding both dates.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-28775) DateTimeUtilsSuite fails for JDKs using the tzdata2018h or newer timezone database

2019-08-19 Thread Herman van Hovell (Jira)

Herman van Hovell created SPARK-28775:
-

 Summary: DateTimeUtilsSuite fails for JDKs using the tzdata2018h 
or newer timezone database
 Key: SPARK-28775
 URL: https://issues.apache.org/jira/browse/SPARK-28775
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell


org.apache.spark.sql.catalyst.util.DateTimeUtilsSuite 'daysToMillis and 
millisToDays'  test case fails because of an update in the timezone library: 
tzdata2018h. This retroactively changes a the value of a missing day for the 
Kwalalein atol. See for more information: 
https://bugs.openjdk.java.net/browse/JDK-8215981

Let's fix this by excluding both dates.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28583) Subqueries should not call `onUpdatePlan` in Adaptive Query Execution

2019-08-07 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28583.
---
   Resolution: Fixed
 Assignee: Maryann Xue
Fix Version/s: 3.0.0

> Subqueries should not call `onUpdatePlan` in Adaptive Query Execution
> -
>
> Key: SPARK-28583
> URL: https://issues.apache.org/jira/browse/SPARK-28583
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Major
> Fix For: 3.0.0
>
>
> Subqueries do not have their own execution id, thus when calling 
> {{AdaptiveSparkPlanExec.onUpdatePlan}}, it will actually get the 
> {{QueryExecution}} instance of the main query, which is wasteful and 
> problematic. It could cause issues like stack overflow or dead locks in some 
> circumstances.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28241) Show metadata operations on ThriftServerTab

2019-07-05 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28241.
---
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0

> Show metadata operations on ThriftServerTab
> ---
>
> Key: SPARK-28241
> URL: https://issues.apache.org/jira/browse/SPARK-28241
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> !https://user-images.githubusercontent.com/5399861/60579741-4cd2c180-9db6-11e9-822a-0433be509b67.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27466) LEAD function with 'ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING' causes exception in Spark

2019-07-01 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875992#comment-16875992
 ] 

Herman van Hovell commented on SPARK-27466:
---

Well, we could add it. I am just not sure what frame specification would mean 
in case lead and lag? Is unbounded preceding and unbounded following the only 
thing that is allowed (I hope so), and what would the semantics be if we allow 
others?

We currently use the frame specification to determine the offset of the 
lead/lag functions. So we need a different way to infer this if we are going to 
change it.

> LEAD function with 'ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING' 
> causes exception in Spark
> ---
>
> Key: SPARK-27466
> URL: https://issues.apache.org/jira/browse/SPARK-27466
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.2.0
> Environment: Spark version 2.2.0.2.6.4.92-2
> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
>Reporter: Zoltan
>Priority: Major
>
> *1. Create a table in Hive:*
>   
> {code:java}
>  CREATE TABLE tab1(
>    col1 varchar(1),
>    col2 varchar(1)
>   )
>  PARTITIONED BY (
>    col3 varchar(1)
>  )
>  LOCATION
>    'hdfs://server1/data/tab1'
> {code}
>  
>  *2. Query the Table in Spark:*
> *2.1: Simple query, no exception thrown:*
> {code:java}
> scala> spark.sql("SELECT * from schema1.tab1").show()
> +-+---++
> |col1|col2|col3|
> +-+---++
> +-+---++
> {code}
> *2.2.: Query causing exception:*
> {code:java}
> scala> spark.sql("*SELECT (LEAD(col1) OVER ( PARTITION BY col3 ORDER BY col1 
> ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING*)) from 
> schema1.tab1")
> {code}
> {code:java}
> org.apache.spark.sql.AnalysisException: Window Frame ROWS BETWEEN UNBOUNDED 
> PRECEDING AND UNBOUNDED FOLLOWING must match the required frame ROWS BETWEEN 
> 1 FOLLOWING AND 1 FOLLOWING;
>    at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:39)
>    at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
>    at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame$$anonfun$apply$30$$anonfun$applyOrElse$11.applyOrElse(Analyzer.scala:2219)
>    at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveWindowFrame$$anonfun$apply$30$$anonfun$applyOrElse$11.applyOrElse(Analyzer.scala:2215)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>    at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
>    at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsDown$1.apply(QueryPlan.scala:258)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsDown$1.apply(QueryPlan.scala:258)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:279)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:289)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1$1.apply(QueryPlan.scala:293)
>    at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>    at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>    at scala.collection.immutable.List.foreach(List.scala:381)
>    at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>    at scala.collection.immutable.List.map(List.scala:285)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:293)
>    at 
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$6.apply(QueryPlan.scala:298)
>    at 
>

[jira] [Resolved] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

2019-06-15 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23128.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> A new approach to do adaptive execution in Spark SQL
> 
>
> Key: SPARK-23128
> URL: https://issues.apache.org/jira/browse/SPARK-23128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Carson Wang
>Assignee: Maryann Xue
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: AdaptiveExecutioninBaidu.pdf
>
>
> SPARK-9850 proposed the basic idea of adaptive execution in Spark. In 
> DAGScheduler, a new API is added to support submitting a single map stage.  
> The current implementation of adaptive execution in Spark SQL supports 
> changing the reducer number at runtime. An Exchange coordinator is used to 
> determine the number of post-shuffle partitions for a stage that needs to 
> fetch shuffle data from one or multiple stages. The current implementation 
> adds ExchangeCoordinator while we are adding Exchanges. However there are 
> some limitations. First, it may cause additional shuffles that may decrease 
> the performance. We can see this from EnsureRequirements rule when it adds 
> ExchangeCoordinator.  Secondly, it is not a good idea to add 
> ExchangeCoordinators while we are adding Exchanges because we don’t have a 
> global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 
> 3 tables’ join in a single stage, the same ExchangeCoordinator should be used 
> in three Exchanges but currently two separated ExchangeCoordinator will be 
> added. Thirdly, with the current framework it is not easy to implement other 
> features in adaptive execution flexibly like changing the execution plan and 
> handling skewed join at runtime.
> We'd like to introduce a new way to do adaptive execution in Spark SQL and 
> address the limitations. The idea is described at 
> [https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23128) A new approach to do adaptive execution in Spark SQL

2019-06-15 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-23128:
-

Assignee: Maryann Xue

> A new approach to do adaptive execution in Spark SQL
> 
>
> Key: SPARK-23128
> URL: https://issues.apache.org/jira/browse/SPARK-23128
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Carson Wang
>Assignee: Maryann Xue
>Priority: Major
> Attachments: AdaptiveExecutioninBaidu.pdf
>
>
> SPARK-9850 proposed the basic idea of adaptive execution in Spark. In 
> DAGScheduler, a new API is added to support submitting a single map stage.  
> The current implementation of adaptive execution in Spark SQL supports 
> changing the reducer number at runtime. An Exchange coordinator is used to 
> determine the number of post-shuffle partitions for a stage that needs to 
> fetch shuffle data from one or multiple stages. The current implementation 
> adds ExchangeCoordinator while we are adding Exchanges. However there are 
> some limitations. First, it may cause additional shuffles that may decrease 
> the performance. We can see this from EnsureRequirements rule when it adds 
> ExchangeCoordinator.  Secondly, it is not a good idea to add 
> ExchangeCoordinators while we are adding Exchanges because we don’t have a 
> global picture of all shuffle dependencies of a post-shuffle stage. I.e. for 
> 3 tables’ join in a single stage, the same ExchangeCoordinator should be used 
> in three Exchanges but currently two separated ExchangeCoordinator will be 
> added. Thirdly, with the current framework it is not easy to implement other 
> features in adaptive execution flexibly like changing the execution plan and 
> handling skewed join at runtime.
> We'd like to introduce a new way to do adaptive execution in Spark SQL and 
> address the limitations. The idea is described at 
> [https://docs.google.com/document/d/1mpVjvQZRAkD-Ggy6-hcjXtBPiQoVbZGe3dLnAKgtJ4k/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-28057.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add method `clone` in catalyst TreeNode
> ---
>
> Key: SPARK-28057
> URL: https://issues.apache.org/jira/browse/SPARK-28057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
> instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28057) Add method `clone` in catalyst TreeNode

2019-06-14 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-28057:
-

Assignee: Maryann Xue

> Add method `clone` in catalyst TreeNode
> ---
>
> Key: SPARK-28057
> URL: https://issues.apache.org/jira/browse/SPARK-28057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Maryann Xue
>Priority: Minor
>
> Add implementation for {{clone}} method in {{TreeNode}}, for de-duplicating 
> instances in the LogicalPlan tree. This is a prerequisite for SPARK-23128.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27071) Expose additional metrics in status.api.v1.StageData

2019-05-27 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-27071.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Expose additional metrics in status.api.v1.StageData
> 
>
> Key: SPARK-27071
> URL: https://issues.apache.org/jira/browse/SPARK-27071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently StageData exposes the following metrics:
>  * executorRunTime
>  * executorCpuTime
>  * inputBytes
>  * inputRecords
>  * outputBytes
>  * outputRecords
>  * shuffleReadBytes
>  * shuffleReadRecords
>  * shuffleWriteBytes
>  * shuffleWriteRecords
>  * memoryBytesSpilled
>  * diskBytesSpilled
> These metrics are computed by aggregating the metrics of the tasks in the 
> stage. For the task metrics however we keep track of a lot more metrics. 
> Currently these metrics are also computed for stages (such shuffle read fetch 
> wait time), but these are not exposed through the api. It would be very 
> useful if these were also exposed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-27071) Expose additional metrics in status.api.v1.StageData

2019-05-27 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-27071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-27071:
-

Assignee: Tom van Bussel

> Expose additional metrics in status.api.v1.StageData
> 
>
> Key: SPARK-27071
> URL: https://issues.apache.org/jira/browse/SPARK-27071
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>
> Currently StageData exposes the following metrics:
>  * executorRunTime
>  * executorCpuTime
>  * inputBytes
>  * inputRecords
>  * outputBytes
>  * outputRecords
>  * shuffleReadBytes
>  * shuffleReadRecords
>  * shuffleWriteBytes
>  * shuffleWriteRecords
>  * memoryBytesSpilled
>  * diskBytesSpilled
> These metrics are computed by aggregating the metrics of the tasks in the 
> stage. For the task metrics however we keep track of a lot more metrics. 
> Currently these metrics are also computed for stages (such shuffle read fetch 
> wait time), but these are not exposed through the api. It would be very 
> useful if these were also exposed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-27449) Clean-up checks in CodegenSupport.limitNotReachedCond

2019-04-12 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-27449:
-

 Summary: Clean-up checks in CodegenSupport.limitNotReachedCond
 Key: SPARK-27449
 URL: https://issues.apache.org/jira/browse/SPARK-27449
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell


The checks in {{CodegenSupport.limitNotReachedCond}} are a bit convoluted and 
prevent you from adding new scan nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27052) Using PySpark udf in transform yields NULL values

2019-03-22 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16799017#comment-16799017
 ] 

Herman van Hovell commented on SPARK-27052:
---

This is not supported at the moment. This will probably be non-trivial to 
implement since we need to figure an performant way to invoke python here. In 
this particular case we can probably rewrite the higher order function into a 
chain map operations of which one will be executed by python. Anyway lets 
discuss this first before starting to code this up.

> Using PySpark udf in transform yields NULL values
> -
>
> Key: SPARK-27052
> URL: https://issues.apache.org/jira/browse/SPARK-27052
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: hejsgpuom62c
>Priority: Major
>
> Steps to reproduce
> {code:java}
> from typing import Optional
> from pyspark.sql.functions import expr
> def f(x: Optional[int]) -> Optional[int]:
> return x + 1 if x is not None else None
> spark.udf.register('f', f, "integer")
> df = (spark
> .createDataFrame([(1, [1, 2, 3])], ("id", "xs"))
> .withColumn("xsinc", expr("transform(xs, x -> f(x))")))
> df.show()
> # +---+-+-+
> # | id|   xs|xsinc|
> # +---+-+-+
> # |  1|[1, 2, 3]| [,,]|
> # +---+-+-+
> {code}
>  
> Source https://stackoverflow.com/a/53762650



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26656) Benchmark for date/time functions and expressions

2019-01-28 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-26656:
-

Assignee: Maxim Gekk

> Benchmark for date/time functions and expressions
> -
>
> Key: SPARK-26656
> URL: https://issues.apache.org/jira/browse/SPARK-26656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Write benchmarks for datetimeExressions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26656) Benchmark for date/time functions and expressions

2019-01-28 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26656.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Benchmark for date/time functions and expressions
> -
>
> Key: SPARK-26656
> URL: https://issues.apache.org/jira/browse/SPARK-26656
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Write benchmarks for datetimeExressions



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26690) Checkpoints of Dataframes are not visible in the SQL UI

2019-01-24 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-26690:
-

Assignee: Tom van Bussel

> Checkpoints of Dataframes are not visible in the SQL UI
> ---
>
> Key: SPARK-26690
> URL: https://issues.apache.org/jira/browse/SPARK-26690
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
>
> Checkpoints and local checkpoints of dataframes do not show up in the SQL UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26690) Checkpoints of Dataframes are not visible in the SQL UI

2019-01-24 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26690.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Checkpoints of Dataframes are not visible in the SQL UI
> ---
>
> Key: SPARK-26690
> URL: https://issues.apache.org/jira/browse/SPARK-26690
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Tom van Bussel
>Assignee: Tom van Bussel
>Priority: Major
> Fix For: 3.0.0
>
>
> Checkpoints and local checkpoints of dataframes do not show up in the SQL UI.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26680) StackOverflowError if Stream passed to groupBy

2019-01-24 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26680.
---
   Resolution: Fixed
 Assignee: Bruce Robbins
Fix Version/s: 3.0.0
   2.4.1

> StackOverflowError if Stream passed to groupBy
> --
>
> Key: SPARK-26680
> URL: https://issues.apache.org/jira/browse/SPARK-26680
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.0, 3.0.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> This Java code results in a StackOverflowError:
> {code:java}
> List groupByCols = new ArrayList<>();
> groupByCols.add(new Column("id1"));
> scala.collection.Seq groupByColsSeq =
> JavaConverters.asScalaIteratorConverter(groupByCols.iterator())
> .asScala().toSeq();
> df.groupBy(groupByColsSeq).max("id2").toDF("id1", "id2").show();
> {code}
> The {{toSeq}} method above produces a Stream. Passing a Stream to groupBy 
> results in the StackOverflowError. In fact, the error can be produced more 
> easily in spark-shell:
> {noformat}
> scala> val df = spark.read.schema("id1 int, id2 int").csv("testinput.csv")
> df: org.apache.spark.sql.DataFrame = [id1: int, id2: int]
> scala> val groupBySeq = Stream(col("id1"))
> groupBySeq: scala.collection.immutable.Stream[org.apache.spark.sql.Column] = 
> Stream(id1, ?)
> scala> df.groupBy(groupBySeq: _*).max("id2").toDF("id1", "id2").collect
> java.lang.StackOverflowError
>   at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
>   at scala.collection.immutable.Stream.drop(Stream.scala:797)
>   at scala.collection.immutable.Stream.drop(Stream.scala:204)
>   at scala.collection.LinearSeqOptimized.apply(LinearSeqOptimized.scala:66)
>   at scala.collection.LinearSeqOptimized.apply$(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.Stream.apply(Stream.scala:204)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:45)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:138)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:133)
>   at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$consume$3(WholeStageCodegenExec.scala:159)
>   at scala.collection.immutable.Stream.$anonfun$map$1(Stream.scala:418)
>   at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1171)
>   at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1161)
>   at scala.collection.immutable.Stream.drop(Stream.scala:797)
>   at scala.collection.immutable.Stream.drop(Stream.scala:204)
>   at scala.collection.LinearSeqOptimized.apply(LinearSeqOptimized.scala:66)
>   at scala.collection.LinearSeqOptimized.apply$(LinearSeqOptimized.scala:65)
>   at scala.collection.immutable.Stream.apply(Stream.scala:204)
>   at 
> org.apache.spark.sql.catalyst.expressions.BoundReference.doGenCode(BoundAttribute.scala:45)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.$anonfun$genCode$3(Expression.scala:138)
>   at scala.Option.getOrElse(Option.scala:138)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression.genCode(Expression.scala:133)
>   at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$consume$3(WholeStageCodegenExec.scala:159)
> ...etc...
> {noformat}
> This is due to the lazy nature of Streams. The method {{consume}} in 
> {{CodegenSupport}} assumes that a map function will be eagerly evaluated:
> {code:java}
> val inputVars =
> ctx.currentVars = null <== the closure cares about this
> ctx.INPUT_ROW = row
> output.zipWithIndex.map { case (attr, i) =>
>   BoundReference(i, attr.dataType, attr.nullable).genCode(ctx)
> -
> -
> -
> ctx.currentVars = inputVars
> ctx.INPUT_ROW = null
> ctx.freshNamePrefix = parent.variablePrefix
> val evaluated = evaluateRequiredVariables(output, inputVars, 
> parent.usedInputs)
> {code}
> The closure passed to the map function assumes {{ctx.currentVars}} will be 
> set to null. But due to lazy evaluation, {{ctx.currentVars}} is set to 
> something else by the time the closure is actually called. Worse yet, 
> {{ctx.currentVars}} is set to the yet-to-be evaluated inputVars stream. The 
> closure uses {{ctx.currentVars}} (via the call {{genCode(ctx)}}), therefore 
> it ends up using the data structure it is attempting to create.
> You can recreate the problem is a vanilla Scala shell:
> {code:java}
> scala> var p1: Seq[Any] = null
> p1: Seq[Any] = null
> scala> val s = Stream(1, 2).zipWithIndex.map { case (x, i) => if (p1 != null) 
> p1(i) else x }
> s: scala.collection.immutable.Stream[Any] =

[jira] [Resolved] (SPARK-26657) Port DayWeek, DayOfWeek and WeekDay on Proleptic Gregorian calendar

2019-01-22 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26657.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> Port DayWeek, DayOfWeek and WeekDay on Proleptic Gregorian calendar
> ---
>
> Key: SPARK-26657
> URL: https://issues.apache.org/jira/browse/SPARK-26657
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently DayWeek and its children use hybrid calendar. Need to port the 
> classes on java.time and use Proleptic Gregorian calendar.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26618) Make typed Timestamp/Date literals consistent to casting

2019-01-18 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26618.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> Make typed Timestamp/Date literals consistent to casting
> 
>
> Key: SPARK-26618
> URL: https://issues.apache.org/jira/browse/SPARK-26618
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently values of typed literals TIMESTAMP and DATE are parsed to desired 
> values by Timestamp.valueOf and Date.valueOf. This restricts date and 
> timestamp pattern, and makes inconsistent to casting to 
> TimestampType/DateType. Also using Timestamp.valueOf and Date.valueOf assumes 
> hybrid calendar while parsing textual representation of timestamps/dates. 
> This should be fixed by re-using cast functionality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26593) Use Proleptic Gregorian calendar in casting UTF8String to date/timestamp types

2019-01-17 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26593.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Use Proleptic Gregorian calendar in casting UTF8String to date/timestamp types
> --
>
> Key: SPARK-26593
> URL: https://issues.apache.org/jira/browse/SPARK-26593
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Current implementation of casting UTF8String to DateType/TimestampType uses 
> hybrid calendar (Gregorian + Julian). The ticket aims to unify conversion of 
> textual date/timestamp representation to DateType/TimestampType and use 
> Proleptic Gregorian calendar. More precisely, need to port stringToTimestamp 
> and stringToDate on java.time of Java 8 API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26593) Use Proleptic Gregorian calendar in casting UTF8String to date/timestamp types

2019-01-17 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-26593:
-

Assignee: Maxim Gekk

> Use Proleptic Gregorian calendar in casting UTF8String to date/timestamp types
> --
>
> Key: SPARK-26593
> URL: https://issues.apache.org/jira/browse/SPARK-26593
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Current implementation of casting UTF8String to DateType/TimestampType uses 
> hybrid calendar (Gregorian + Julian). The ticket aims to unify conversion of 
> textual date/timestamp representation to DateType/TimestampType and use 
> Proleptic Gregorian calendar. More precisely, need to port stringToTimestamp 
> and stringToDate on java.time of Java 8 API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26550) New datasource for benchmarking

2019-01-16 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26550.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> New datasource for benchmarking
> ---
>
> Key: SPARK-26550
> URL: https://issues.apache.org/jira/browse/SPARK-26550
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Purpose of new datasource is materialisation of dataset without additional 
> overhead associated with actions and converting row's values to other types. 
> This can be used in benchmarking as well as in cases when need to materialise 
> a dataset for side effects like in caching.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26450) Map of schema is built too frequently in some wide queries

2019-01-13 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26450.
---
   Resolution: Fixed
 Assignee: Bruce Robbins
Fix Version/s: 3.0.0

> Map of schema is built too frequently in some wide queries
> --
>
> Key: SPARK-26450
> URL: https://issues.apache.org/jira/browse/SPARK-26450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
> Fix For: 3.0.0
>
>
> When executing queries with wide projections and wide schemas, Spark rebuilds 
> an attribute map for the same schema many times.
> For example:
> {noformat}
> select * from orctbl where id1 = 1
> {noformat}
> Assume {{orctbl}} has 6000 columns and 34 files. In that case, the above 
> query creates an AttributeSeq object 270,000 times[1]. Each AttributeSeq 
> instantiation builds a map of the entire list of 6000 attributes (but not 
> until lazy val exprIdToOrdinal is referenced).
> Whenever OrcFileFormat reads a new file, it generates a new unsafe 
> projection. That results in this 
> [function|https://github.com/apache/spark/blob/827383a97c11a61661440ff86ce0c3382a2a23b2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala#L319]
>  getting called:
> {code:java}
> protected def bind(in: Seq[Expression], inputSchema: Seq[Attribute]): 
> Seq[Expression] =
> in.map(BindReferences.bindReference(_, inputSchema))
> {code}
> For each column in the projection, this line calls bindReference. Each call 
> passes inputSchema, a Sequence of Attributes, to a parameter position 
> expecting an AttributeSeq. The compiler implicitly calls the constructor for 
> AttributeSeq, which (lazily) builds a map for every attribute in the schema. 
> Therefore, this function builds a map of the entire schema once for each 
> column in the projection, and it does this for each input file. For the above 
> example query, this accounts for 204K instantiations of AttributeSeq.
> Readers for CSV and JSON tables do something similar.
> In addition, ProjectExec also creates an unsafe projection for each task. As 
> a result, this 
> [line|https://github.com/apache/spark/blob/827383a97c11a61661440ff86ce0c3382a2a23b2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala#L91]
>  gets called, which has the same issue:
> {code:java}
>   def toBoundExprs(exprs: Seq[Expression], inputSchema: Seq[Attribute]): 
> Seq[Expression] = {
> exprs.map(BindReferences.bindReference(_, inputSchema))
>   }
> {code}
> The above affects all wide queries that have a projection node, regardless of 
> the file reader. For the example query, ProjectExec accounts for the 
> additional 66K instantiations of the AttributeSeq.
> Spark can save time by pre-building the AttributeSeq right before the map 
> operations in {{bind}} and {{toBoundExprs}}. The time saved depends on size 
> of schema, size of projection, number of input files (for Orc), number of 
> file splits (for CSV, and JSON tables), and number of tasks.
> For a 6000 column CSV table with 500K records and 34 input files, the time 
> savings is only 6%[1] because Spark doesn't create as many unsafe projections 
> as compared to Orc tables.
> On the other hand, for a 6000 column Orc table with 500K records and 34 input 
> files, the time savings is about 16%[1].
> [1] based on queries run in local mode with 8 executor threads on my laptop.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26502) Get rid of hiveResultString() in QueryExecution

2019-01-03 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26502.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> Get rid of hiveResultString() in QueryExecution
> ---
>
> Key: SPARK-26502
> URL: https://issues.apache.org/jira/browse/SPARK-26502
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> The method hiveResultString() of QueryExecution is used in test and 
> SparkSQLDriver. It should be moved from QueryExecution to more specific class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26495) Simplify SelectedField extractor

2018-12-31 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26495.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Simplify SelectedField extractor
> 
>
> Key: SPARK-26495
> URL: https://issues.apache.org/jira/browse/SPARK-26495
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
> Fix For: 3.0.0
>
>
> I was reading through the code of the {{SelectedField}} extractor and this is 
> overly complex. It contains a couple of pattern matches that are redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26504) Rope-wise dumping of Spark plans

2018-12-31 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26504.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Rope-wise dumping of Spark plans 
> -
>
> Key: SPARK-26504
> URL: https://issues.apache.org/jira/browse/SPARK-26504
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, Spark plans are converted to string via StringBuilderWriter when 
> memory for strings are allocated sequentially as soon as elements of plans 
> are added to the StringBuilder.
> Proposed improvement is StringRope which has 2 methods:
> 1. append(s: String): Unit - adds the string to internal list and increases 
> total size
> 2. toString: String - converts the list of strings to strings



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26504) Rope-wise dumping of Spark plans

2018-12-31 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-26504:
-

Assignee: Maxim Gekk

> Rope-wise dumping of Spark plans 
> -
>
> Key: SPARK-26504
> URL: https://issues.apache.org/jira/browse/SPARK-26504
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> Currently, Spark plans are converted to string via StringBuilderWriter when 
> memory for strings are allocated sequentially as soon as elements of plans 
> are added to the StringBuilder.
> Proposed improvement is StringRope which has 2 methods:
> 1. append(s: String): Unit - adds the string to internal list and increases 
> total size
> 2. toString: String - converts the list of strings to strings



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26495) Simplify SelectedField extractor

2018-12-28 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-26495:
-

 Summary: Simplify SelectedField extractor
 Key: SPARK-26495
 URL: https://issues.apache.org/jira/browse/SPARK-26495
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell


I was reading through the code of the {{SelectedField}} extractor and this is 
overly complex. It contains a couple of pattern matches that are redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26191) Control number of truncated fields

2018-12-27 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26191.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> Control number of truncated fields
> --
>
> Key: SPARK-26191
> URL: https://issues.apache.org/jira/browse/SPARK-26191
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, the threshold for truncated fields converted to string can be 
> controlled via global SQL config. Need to add the maxFields parameter to all 
> functions/methods that potentially could produce truncated string from a 
> sequence of fields.
> One of use cases is toFile. This method aims to output not truncated plans. 
> For now users has to set global config to flush whole plans.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26038) Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long

2018-11-23 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-26038:
-

Assignee: Juliusz Sompolski

> Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in 
> long
> 
>
> Key: SPARK-26038
> URL: https://issues.apache.org/jira/browse/SPARK-26038
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>
> Decimal toScalaBigInt/toJavaBigInteger just called toLong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26038) Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in long

2018-11-23 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26038.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Decimal toScalaBigInt/toJavaBigInteger not work for decimals not fitting in 
> long
> 
>
> Key: SPARK-26038
> URL: https://issues.apache.org/jira/browse/SPARK-26038
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
> Fix For: 3.0.0
>
>
> Decimal toScalaBigInt/toJavaBigInteger just called toLong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26131) Remove sqlContext.conf from Spark SQL physical operators

2018-11-20 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-26131:
-

 Summary: Remove sqlContext.conf from Spark SQL physical operators
 Key: SPARK-26131
 URL: https://issues.apache.org/jira/browse/SPARK-26131
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26084) AggregateExpression.references fails on unresolved expression trees

2018-11-20 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26084.
---
   Resolution: Fixed
 Assignee: Simeon Simeonov
Fix Version/s: 3.0.0
   2.4.1
   2.3.3

> AggregateExpression.references fails on unresolved expression trees
> ---
>
> Key: SPARK-26084
> URL: https://issues.apache.org/jira/browse/SPARK-26084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Simeon Simeonov
>Assignee: Simeon Simeonov
>Priority: Major
>  Labels: aggregate, regression, sql
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> [SPARK-18394|https://issues.apache.org/jira/browse/SPARK-18394] introduced a 
> stable ordering in {{AttributeSet.toSeq}} using expression IDs 
> ([PR-18959|https://github.com/apache/spark/pull/18959/files#diff-75576f0ec7f9d8b5032000245217d233R128])
>  without noticing that {{AggregateExpression.references}} used 
> {{AttributeSet.toSeq}} as a shortcut 
> ([link|https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala#L132]).
>  The net result is that {{AggregateExpression.references}} fails for 
> unresolved aggregate functions.
> {code:scala}
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression(
>   org.apache.spark.sql.catalyst.expressions.aggregate.Sum(('x + 'y).expr),
>   mode = org.apache.spark.sql.catalyst.expressions.aggregate.Complete,
>   isDistinct = false
> ).references
> {code}
> fails with
> {code:scala}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> exprId on unresolved object, tree: 'y
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
>   at scala.math.Ordering$$anon$5.compare(Ordering.scala:122)
>   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   at java.util.TimSort.sort(TimSort.java:220)
>   at java.util.Arrays.sort(Arrays.java:1438)
>   at scala.collection.SeqLike$class.sorted(SeqLike.scala:648)
>   at scala.collection.AbstractSeq.sorted(Seq.scala:41)
>   at scala.collection.SeqLike$class.sortBy(SeqLike.scala:623)
>   at scala.collection.AbstractSeq.sortBy(Seq.scala:41)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeSet.toSeq(AttributeSet.scala:128)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.references(interfaces.scala:201)
> {code}
> The solution is to avoid calling {{toSeq}} as ordering is not important in 
> {{references}} and simplify (and speed up) the implementation to something 
> like
> {code:scala}
> mode match {
>   case Partial | Complete => aggregateFunction.references
>   case PartialMerge | Final => 
> AttributeSet(aggregateFunction.aggBufferAttributes)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26084) AggregateExpression.references fails on unresolved expression trees

2018-11-16 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689312#comment-16689312
 ] 

Herman van Hovell commented on SPARK-26084:
---

[~simeons] since you have already propose a solution, do you mind opening a PR.

> AggregateExpression.references fails on unresolved expression trees
> ---
>
> Key: SPARK-26084
> URL: https://issues.apache.org/jira/browse/SPARK-26084
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Simeon Simeonov
>Priority: Major
>  Labels: aggregate, regression, sql
>
> [SPARK-18394|https://issues.apache.org/jira/browse/SPARK-18394] introduced a 
> stable ordering in {{AttributeSet.toSeq}} using expression IDs 
> ([PR-18959|https://github.com/apache/spark/pull/18959/files#diff-75576f0ec7f9d8b5032000245217d233R128])
>  without noticing that {{AggregateExpression.references}} used 
> {{AttributeSet.toSeq}} as a shortcut 
> ([link|https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala#L132]).
>  The net result is that {{AggregateExpression.references}} fails for 
> unresolved aggregate functions.
> {code:scala}
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression(
>   org.apache.spark.sql.catalyst.expressions.aggregate.Sum(('x + 'y).expr),
>   mode = org.apache.spark.sql.catalyst.expressions.aggregate.Complete,
>   isDistinct = false
> ).references
> {code}
> fails with
> {code:scala}
> org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> exprId on unresolved object, tree: 'y
>   at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.exprId(unresolved.scala:104)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeSet$$anonfun$toSeq$2.apply(AttributeSet.scala:128)
>   at scala.math.Ordering$$anon$5.compare(Ordering.scala:122)
>   at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
>   at java.util.TimSort.sort(TimSort.java:220)
>   at java.util.Arrays.sort(Arrays.java:1438)
>   at scala.collection.SeqLike$class.sorted(SeqLike.scala:648)
>   at scala.collection.AbstractSeq.sorted(Seq.scala:41)
>   at scala.collection.SeqLike$class.sortBy(SeqLike.scala:623)
>   at scala.collection.AbstractSeq.sortBy(Seq.scala:41)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeSet.toSeq(AttributeSet.scala:128)
>   at 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression.references(interfaces.scala:201)
> {code}
> The solution is to avoid calling {{toSeq}} as ordering is not important in 
> {{references}} and simplify (and speed up) the implementation to something 
> like
> {code:scala}
> mode match {
>   case Partial | Complete => aggregateFunction.references
>   case PartialMerge | Final => 
> AttributeSet(aggregateFunction.aggBufferAttributes)
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26023) Dumping truncated plans to a file

2018-11-13 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-26023.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

> Dumping truncated plans to a file
> -
>
> Key: SPARK-26023
> URL: https://issues.apache.org/jira/browse/SPARK-26023
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 3.0.0
>
>
> The ticket aims to dumping truncated plans + generated code to a file without 
> full materialization them in memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-29 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-25767:
-

Assignee: Peter Toth

> Error reported in Spark logs when using the 
> org.apache.spark:spark-sql_2.11:2.3.2 Java library
> --
>
> Key: SPARK-25767
> URL: https://issues.apache.org/jira/browse/SPARK-25767
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.2.0, 2.3.2
>Reporter: Thomas Brugiere
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
> Attachments: fileA.csv, fileB.csv, fileC.csv
>
>
> Hi,
> Here is a bug I found using the latest version of spark-sql_2.11:2.2.0. Note 
> that this case was also tested with spark-sql_2.11:2.3.2 and the bug is also 
> present.
> This issue is a duplicate of the SPARK-25582 issue that I had to close after 
> an accidental manipulation from another developer (was linked to a wrong PR)
> You will find attached three small sample CSV files with the minimal content 
> to raise the bug.
> Find below a reproducer code:
> {code:java}
> import org.apache.spark.SparkConf;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SparkSession;
> import scala.collection.JavaConverters;
> import scala.collection.Seq;
> import java.util.Arrays;
> public class SparkBug {
> private static  Seq arrayToSeq(T[] input) {
> return 
> JavaConverters.asScalaIteratorConverter(Arrays.asList(input).iterator()).asScala().toSeq();
> }
> public static void main(String[] args) throws Exception {
> SparkConf conf = new 
> SparkConf().setAppName("SparkBug").setMaster("local");
> SparkSession sparkSession = 
> SparkSession.builder().config(conf).getOrCreate();
> Dataset df_a = sparkSession.read().option("header", 
> true).csv("local/fileA.csv").dropDuplicates();
> Dataset df_b = sparkSession.read().option("header", 
> true).csv("local/fileB.csv").dropDuplicates();
> Dataset df_c = sparkSession.read().option("header", 
> true).csv("local/fileC.csv").dropDuplicates();
> String[] key_join_1 = new String[]{"colA", "colB", "colC", "colD", 
> "colE", "colF"};
> String[] key_join_2 = new String[]{"colA", "colB", "colC", "colD", 
> "colE"};
> Dataset df_inventory_1 = df_a.join(df_b, arrayToSeq(key_join_1), 
> "left");
> Dataset df_inventory_2 = df_inventory_1.join(df_c, 
> arrayToSeq(key_join_2), "left");
> df_inventory_2.show();
> }
> }
> {code}
> When running this code, I can see the exception below:
> {code:java}
> 18/10/18 09:25:49 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 202, Column 18: Expression "agg_isNull_28" is not an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 202, Column 18: Expression "agg_isNull_28" is not an rvalue
>     at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821)
>     at 
> org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7170)
>     at 
> org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:5332)
>     at org.codehaus.janino.UnitCompiler.access$9400(UnitCompiler.java:212)
>     at 
> org.codehaus.janino.UnitCompiler$13$1.visitAmbiguousName(UnitCompiler.java:5287)
>     at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4053)
>     at org.codehaus.janino.UnitCompiler$13.visitLvalue(UnitCompiler.java:5284)
>     at org.codehaus.janino.Java$Lvalue.accept(Java.java:3977)
>     at 
> org.codehaus.janino.UnitCompiler.getConstantValue(UnitCompiler.java:5280)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2391)
>     at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:212)
>     at 
> org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1474)
>     at 
> org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1466)
>     at org.codehaus.janino.Java$IfStatement.accept(Java.java:2926)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1466)
>     at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1546)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3075)
>     at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
>     at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
>     at org.codehaus.janino.UnitCompiler.access$700(UnitCompiler.java:212)

[jira] [Resolved] (SPARK-25767) Error reported in Spark logs when using the org.apache.spark:spark-sql_2.11:2.3.2 Java library

2018-10-29 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-25767.
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   2.4.1

> Error reported in Spark logs when using the 
> org.apache.spark:spark-sql_2.11:2.3.2 Java library
> --
>
> Key: SPARK-25767
> URL: https://issues.apache.org/jira/browse/SPARK-25767
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.2.0, 2.3.2
>Reporter: Thomas Brugiere
>Assignee: Peter Toth
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
> Attachments: fileA.csv, fileB.csv, fileC.csv
>
>
> Hi,
> Here is a bug I found using the latest version of spark-sql_2.11:2.2.0. Note 
> that this case was also tested with spark-sql_2.11:2.3.2 and the bug is also 
> present.
> This issue is a duplicate of the SPARK-25582 issue that I had to close after 
> an accidental manipulation from another developer (was linked to a wrong PR)
> You will find attached three small sample CSV files with the minimal content 
> to raise the bug.
> Find below a reproducer code:
> {code:java}
> import org.apache.spark.SparkConf;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SparkSession;
> import scala.collection.JavaConverters;
> import scala.collection.Seq;
> import java.util.Arrays;
> public class SparkBug {
> private static  Seq arrayToSeq(T[] input) {
> return 
> JavaConverters.asScalaIteratorConverter(Arrays.asList(input).iterator()).asScala().toSeq();
> }
> public static void main(String[] args) throws Exception {
> SparkConf conf = new 
> SparkConf().setAppName("SparkBug").setMaster("local");
> SparkSession sparkSession = 
> SparkSession.builder().config(conf).getOrCreate();
> Dataset df_a = sparkSession.read().option("header", 
> true).csv("local/fileA.csv").dropDuplicates();
> Dataset df_b = sparkSession.read().option("header", 
> true).csv("local/fileB.csv").dropDuplicates();
> Dataset df_c = sparkSession.read().option("header", 
> true).csv("local/fileC.csv").dropDuplicates();
> String[] key_join_1 = new String[]{"colA", "colB", "colC", "colD", 
> "colE", "colF"};
> String[] key_join_2 = new String[]{"colA", "colB", "colC", "colD", 
> "colE"};
> Dataset df_inventory_1 = df_a.join(df_b, arrayToSeq(key_join_1), 
> "left");
> Dataset df_inventory_2 = df_inventory_1.join(df_c, 
> arrayToSeq(key_join_2), "left");
> df_inventory_2.show();
> }
> }
> {code}
> When running this code, I can see the exception below:
> {code:java}
> 18/10/18 09:25:49 ERROR CodeGenerator: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 202, Column 18: Expression "agg_isNull_28" is not an rvalue
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 202, Column 18: Expression "agg_isNull_28" is not an rvalue
>     at org.codehaus.janino.UnitCompiler.compileError(UnitCompiler.java:11821)
>     at 
> org.codehaus.janino.UnitCompiler.toRvalueOrCompileException(UnitCompiler.java:7170)
>     at 
> org.codehaus.janino.UnitCompiler.getConstantValue2(UnitCompiler.java:5332)
>     at org.codehaus.janino.UnitCompiler.access$9400(UnitCompiler.java:212)
>     at 
> org.codehaus.janino.UnitCompiler$13$1.visitAmbiguousName(UnitCompiler.java:5287)
>     at org.codehaus.janino.Java$AmbiguousName.accept(Java.java:4053)
>     at org.codehaus.janino.UnitCompiler$13.visitLvalue(UnitCompiler.java:5284)
>     at org.codehaus.janino.Java$Lvalue.accept(Java.java:3977)
>     at 
> org.codehaus.janino.UnitCompiler.getConstantValue(UnitCompiler.java:5280)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2391)
>     at org.codehaus.janino.UnitCompiler.access$1900(UnitCompiler.java:212)
>     at 
> org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1474)
>     at 
> org.codehaus.janino.UnitCompiler$6.visitIfStatement(UnitCompiler.java:1466)
>     at org.codehaus.janino.Java$IfStatement.accept(Java.java:2926)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:1466)
>     at 
> org.codehaus.janino.UnitCompiler.compileStatements(UnitCompiler.java:1546)
>     at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:3075)
>     at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1336)
>     at 
> org.codehaus.janino.UnitCompiler.compileDeclaredMethods(UnitCompiler.java:1309)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:799)
>     at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:958)
>     at

[jira] [Resolved] (SPARK-25560) Allow Function Injection in SparkSessionExtensions

2018-10-19 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-25560.
---
   Resolution: Fixed
 Assignee: Russell Spitzer
Fix Version/s: 3.0.0

> Allow Function Injection in SparkSessionExtensions
> --
>
> Key: SPARK-25560
> URL: https://issues.apache.org/jira/browse/SPARK-25560
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, SQL
>Affects Versions: 2.4.0
>Reporter: Russell Spitzer
>Assignee: Russell Spitzer
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently there is no way to add a set of external functions to all sessions 
> made by users. We could add a small extension to SparkSessionExtensions which 
> would allow this to be done. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25708) HAVING without GROUP BY means global aggregate

2018-10-11 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-25708:
--
Labels: correctness release-notes  (was: correctness)

> HAVING without GROUP BY means global aggregate
> --
>
> Key: SPARK-25708
> URL: https://issues.apache.org/jira/browse/SPARK-25708
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: correctness, release-notes
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25463) Make sure single expression can parse sort order

2018-09-19 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-25463:
-

 Summary: Make sure single expression can parse sort order
 Key: SPARK-25463
 URL: https://issues.apache.org/jira/browse/SPARK-25463
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell


{{ParserInterface.parseExpression(..)}} should be able to parse all 
expressions. Currently it does not support sort orders.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25378) ArrayData.toArray assume UTF8String

2018-09-08 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16608122#comment-16608122
 ] 

Herman van Hovell commented on SPARK-25378:
---

I think this particular use is wrong. {{StringType}} is represented by 
{{UTF8Sting}} and not by {{java.lang.String}}. The correct way of doing this 
for a {{java.lang.String}} is the following:
{noformat}
import org.apache.spark.sql.catalyst.util._import 
org.apache.spark.sql.types.StringType

ArrayData.toArrayData(Array("a", 
"b")).toArray[String](ObjectType(classOf[String]))

res0: Array[String] = Array(a, b)
{noformat}

I am inclined to close this as not a problem.


> ArrayData.toArray assume UTF8String
> ---
>
> Key: SPARK-25378
> URL: https://issues.apache.org/jira/browse/SPARK-25378
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> The following code works in 2.3.1 but failed in 2.4.0-SNAPSHOT:
> {code}
> import org.apache.spark.sql.catalyst.util._
> import org.apache.spark.sql.types.StringType
> ArrayData.toArrayData(Array("a", "b")).toArray[String](StringType)
> res0: Array[String] = Array(a, b)
> {code}
> In 2.4.0-SNAPSHOT, the error is
> {code}java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
>   at 
> org.apache.spark.sql.catalyst.util.GenericArrayData.getUTF8String(GenericArrayData.scala:75)
>   at 
> org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136)
>   at 
> org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136)
>   at org.apache.spark.sql.catalyst.util.ArrayData.toArray(ArrayData.scala:178)
>   ... 51 elided
> {code}
> cc: [~cloud_fan] [~yogeshg]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25209) Optimization in Dataset.apply for DataFrames

2018-08-23 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-25209.
---
   Resolution: Fixed
 Assignee: Bogdan Raducanu
Fix Version/s: 2.4.0

> Optimization in Dataset.apply for DataFrames
> 
>
> Key: SPARK-25209
> URL: https://issues.apache.org/jira/browse/SPARK-25209
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Bogdan Raducanu
>Assignee: Bogdan Raducanu
>Priority: Major
> Fix For: 2.4.0
>
>
> {{Dataset.apply}} calls {{dataset.deserializer}} (to provide an early error) 
> which ends up calling the full {{Analyzer}} on the deserializer. This can 
> take tens of milliseconds, depending on how big the plan is.
>  Since {{Dataset.apply}} is called for many {{Dataset}} operations such as 
> {{Dataset.where}} it can be a significant overhead for short queries.
> In the following code: {{duration}} is *17 ms* in current spark *vs 1 ms* 
>  if I remove the line {{dataset.deserializer}}.
> It seems the resulting {{deserializer}} is particularly big in the case of 
> nested schema, but the same overhead can be observed if we have a very wide 
> flat schema.
>  According to a comment in the PR that introduced this check, we can at least 
> remove this check for {{DataFrames}}: 
> [https://github.com/apache/spark/pull/20402#discussion_r164338267]
> {code}
> val col = "named_struct(" +
>   (0 until 100).map { i => s"'col$i', id"}.mkString(",") + ")"
> val df = spark.range(10).selectExpr(col)
> val TRUE = lit(true)
> val numIter = 1000
> var startTime = System.nanoTime()
> for(i <- 0 until numIter) {
>   df.where(TRUE)
> }
> val durationMs = (System.nanoTime() - startTime) / numIter / 100
> println(s"duration $durationMs")
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19355) Use map output statistices to improve global limit's parallelism

2018-08-10 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-19355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-19355.
---
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 2.4.0

> Use map output statistices to improve global limit's parallelism
> 
>
> Key: SPARK-19355
> URL: https://issues.apache.org/jira/browse/SPARK-19355
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> A logical Limit is performed actually by two physical operations LocalLimit 
> and GlobalLimit.
> In most of time, before GlobalLimit, we will perform a shuffle exchange to 
> shuffle data to single partition. When the limit number is very big, we 
> shuffle a lot of data to a single partition and significantly reduce 
> parallelism, except for the cost of shuffling.
> This change tries to perform GlobalLimit without shuffling data to single 
> partition. Instead, we perform the map stage of the shuffling and collect the 
> statistics of the number of rows in each partition. Shuffled data are 
> actually all retrieved locally without from remote executors.
> Once we get the number of output rows in each partition, we only take the 
> required number of rows from the locally shuffled data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23908) High-order function: transform(array, function) → array

2018-08-02 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-23908:
-

Assignee: Takuya Ueshin  (was: Herman van Hovell)

> High-order function: transform(array, function) → array
> ---
>
> Key: SPARK-23908
> URL: https://issues.apache.org/jira/browse/SPARK-23908
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 2.4.0
>
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns an array that is the result of applying function to each element of 
> array:
> {noformat}
> SELECT transform(ARRAY [], x -> x + 1); -- []
> SELECT transform(ARRAY [5, 6], x -> x + 1); -- [6, 7]
> SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1); -- [6, 1, 7]
> SELECT transform(ARRAY ['x', 'abc', 'z'], x -> x || '0'); -- ['x0', 'abc0', 
> 'z0']
> SELECT transform(ARRAY [ARRAY [1, NULL, 2], ARRAY[3, NULL]], a -> filter(a, x 
> -> x IS NOT NULL)); -- [[1, 2], [3]]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16203) regexp_extract to return an ArrayType(StringType())

2018-07-22 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551966#comment-16551966
 ] 

Herman van Hovell commented on SPARK-16203:
---

[~nnicolini] adding {{regexp_extract_all}} makes sense. Can you file a new 
ticket for this? BTW there might already one.

> regexp_extract to return an ArrayType(StringType())
> ---
>
> Key: SPARK-16203
> URL: https://issues.apache.org/jira/browse/SPARK-16203
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Max Moroz
>Priority: Minor
>
> regexp_extract only returns a single matched group. If (as if often the case 
> - e.g., web log parsing) we need to parse the entire line and get all the 
> groups, we'll need to call it as many times as there are groups.
> It's only a minor annoyance syntactically.
> But unless I misunderstand something, it would be very inefficient.  (How 
> would Spark know not to do multiple pattern matching operations, when only 
> one is needed? Or does the optimizer actually check whether the patterns are 
> identical, and if they are, avoid the repeated regex matching operations??)
> Would it be  possible to have it return an array when the index is not 
> specified (defaulting to None)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24488) Analyzer throws when generator is aliased multiple times

2018-07-20 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24488.
---
   Resolution: Fixed
 Assignee: Brandon Krieger
Fix Version/s: 2.4.0

> Analyzer throws when generator is aliased multiple times
> 
>
> Key: SPARK-24488
> URL: https://issues.apache.org/jira/browse/SPARK-24488
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Brandon Krieger
>Assignee: Brandon Krieger
>Priority: Minor
> Fix For: 2.4.0
>
>
> Currently, the Analyzer throws an exception if your try to nest a generator. 
> However, it special cases generators "nested" in an alias, and allows that. 
> If you try to alias a generator twice, it is not caught by the special case, 
> so an exception is thrown:
>  
> {code:java}
> scala> Seq(("a", "b"))
> .toDF("col1","col2")
> .select(functions.array('col1,'col2).as("arr"))
> .select(functions.explode('arr).as("first").as("second"))
> .collect()
> org.apache.spark.sql.AnalysisException: Generators are not supported when 
> it's nested in expressions, but got: explode(arr) AS `first`;
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$23.applyOrElse(Analyzer.scala:1604)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ExtractGenerator$$anonfun$apply$23.applyOrElse(Analyzer.scala:1601)
> {code}
>  
> In reality, aliasing twice is fine, so we can fix this by trimming non 
> top-level aliases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24846) Stabilize expression cannonicalization

2018-07-19 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24846.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Fixed by gvr's PR. I could not find this user in JIRA.

> Stabilize expression cannonicalization
> --
>
> Key: SPARK-24846
> URL: https://issues.apache.org/jira/browse/SPARK-24846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Herman van Hovell
>Priority: Major
>  Labels: spree
> Fix For: 2.4.0
>
>
> Spark plan canonicalization is can be non-deterministic between different 
> versions of spark due to the fact that {{ExprId}} uses a UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-23908) High-order function: transform(array, function) → array

2018-07-18 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-23908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548427#comment-16548427
 ] 

Herman van Hovell edited comment on SPARK-23908 at 7/18/18 9:30 PM:


Yeah I am, sorry for the hold up. I'll try to have something out ASAP.

BTW: I don't see a target version set, the affected version is (which is a bit 
weird for a feature).


was (Author: hvanhovell):
Yeah I am, sorry for the hold up. I'll try to have something out ASAP.

> High-order function: transform(array, function) → array
> ---
>
> Key: SPARK-23908
> URL: https://issues.apache.org/jira/browse/SPARK-23908
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Herman van Hovell
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns an array that is the result of applying function to each element of 
> array:
> {noformat}
> SELECT transform(ARRAY [], x -> x + 1); -- []
> SELECT transform(ARRAY [5, 6], x -> x + 1); -- [6, 7]
> SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1); -- [6, 1, 7]
> SELECT transform(ARRAY ['x', 'abc', 'z'], x -> x || '0'); -- ['x0', 'abc0', 
> 'z0']
> SELECT transform(ARRAY [ARRAY [1, NULL, 2], ARRAY[3, NULL]], a -> filter(a, x 
> -> x IS NOT NULL)); -- [[1, 2], [3]]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23908) High-order function: transform(array, function) → array

2018-07-18 Thread Herman van Hovell (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-23908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16548427#comment-16548427
 ] 

Herman van Hovell commented on SPARK-23908:
---

Yeah I am, sorry for the hold up. I'll try to have something out ASAP.

> High-order function: transform(array, function) → array
> ---
>
> Key: SPARK-23908
> URL: https://issues.apache.org/jira/browse/SPARK-23908
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Herman van Hovell
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns an array that is the result of applying function to each element of 
> array:
> {noformat}
> SELECT transform(ARRAY [], x -> x + 1); -- []
> SELECT transform(ARRAY [5, 6], x -> x + 1); -- [6, 7]
> SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1); -- [6, 1, 7]
> SELECT transform(ARRAY ['x', 'abc', 'z'], x -> x || '0'); -- ['x0', 'abc0', 
> 'z0']
> SELECT transform(ARRAY [ARRAY [1, NULL, 2], ARRAY[3, NULL]], a -> filter(a, x 
> -> x IS NOT NULL)); -- [[1, 2], [3]]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18600) BZ2 CRC read error needs better reporting

2018-07-18 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-18600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-18600:
--
Labels: spree  (was: )

> BZ2 CRC read error needs better reporting
> -
>
> Key: SPARK-18600
> URL: https://issues.apache.org/jira/browse/SPARK-18600
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Charles R Allen
>Priority: Minor
>  Labels: spree
>
> {code}
> 16/11/25 20:05:03 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 148 
> in stage 5.0 failed 1 times, most recent failure: Lost task 148.0 in stage 
> 5.0 (TID 5945, localhost): org.apache.spark.SparkException: Task failed while 
> writing rows
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:261)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(InsertIntoHadoopFsRelationCommand.scala:143)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: com.univocity.parsers.common.TextParsingException: 
> java.lang.IllegalStateException - Error reading from input
> Parser Configuration: CsvParserSettings:
> Auto configuration enabled=true
> Autodetect column delimiter=false
> Autodetect quotes=false
> Column reordering enabled=true
> Empty value=null
> Escape unquoted values=false
> Header extraction enabled=null
> Headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, OPR_DT, OPR_HR, 
> NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, XML_DATA_ITEM, 
> PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP]
> Ignore leading whitespaces=false
> Ignore trailing whitespaces=false
> Input buffer size=128
> Input reading on separate thread=false
> Keep escape sequences=false
> Line separator detection enabled=false
> Maximum number of characters per column=100
> Maximum number of columns=20480
> Normalize escaped line separators=true
> Null value=
> Number of records to read=all
> Row processor=none
> RowProcessor error handler=null
> Selected fields=none
> Skip empty lines=true
> Unescaped quote handling=STOP_AT_DELIMITERFormat configuration:
> CsvFormat:
> Comment character=\0
> Field delimiter=,
> Line separator (normalized)=\n
> Line separator sequence=\n
> Quote character="
> Quote escape character=\
> Quote escape escape character=null
> Internal state when error was thrown: line=27089, column=13, record=27089, 
> charIndex=4451456, headers=[INTERVALSTARTTIME_GMT, INTERVALENDTIME_GMT, 
> OPR_DT, OPR_HR, NODE_ID_XML, NODE_ID, NODE, MARKET_RUN_ID, LMP_TYPE, 
> XML_DATA_ITEM, PNODE_RESMRID, GRP_TYPE, POS, VALUE, OPR_INTERVAL, GROUP]
> at 
> com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:302)
> at 
> com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:431)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:148)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:131)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
> at 
>

[jira] [Updated] (SPARK-23612) Specify formats for individual DateType and TimestampType columns in schemas

2018-07-18 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23612:
--
Labels: DataType date spree sql  (was: DataType date sql)

> Specify formats for individual DateType and TimestampType columns in schemas
> 
>
> Key: SPARK-23612
> URL: https://issues.apache.org/jira/browse/SPARK-23612
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Patrick Young
>Priority: Minor
>  Labels: DataType, date, spree, sql
>
> [https://github.com/apache/spark/blob/407f67249639709c40c46917700ed6dd736daa7d/python/pyspark/sql/types.py#L162-L200]
> It would be very helpful if it were possible to specify the format for 
> individual columns in a schema when reading csv files, rather than one format:
> {code:java|title=Bar.python|borderStyle=solid}
> # Currently can only do something like:
> spark.read.option("dateFormat", "MMdd").csv(...) 
> # Would like to be able to do something like:
> schema = StructType([
>     StructField("date1", DateType(format="MM/dd/"), True),
>     StructField("date2", DateType(format="MMdd"), True)
> ]
> read.schema(schema).csv(...)
> {code}
> Thanks for any help, input!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24838) Support uncorrelated IN/EXISTS subqueries for more operators

2018-07-18 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-24838:
--
Labels: spree  (was: )

> Support uncorrelated IN/EXISTS subqueries for more operators 
> -
>
> Key: SPARK-24838
> URL: https://issues.apache.org/jira/browse/SPARK-24838
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Qifan Pu
>Priority: Major
>  Labels: spree
>
> Currently, CheckAnalysis allows IN/EXISTS subquery only for filter operators. 
> Running a query:
> {{select name in (select * from valid_names)}}
> {{from all_names}}
> returns error:
> {code:java}
> Error in SQL statement: AnalysisException: IN/EXISTS predicate sub-queries 
> can only be used in a Filter
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-24846) Stabilize expression cannonicalization

2018-07-18 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-24846:
-

 Summary: Stabilize expression cannonicalization
 Key: SPARK-24846
 URL: https://issues.apache.org/jira/browse/SPARK-24846
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: Herman van Hovell


Spark plan canonicalization is can be non-deterministic between different 
versions of spark due to the fact that {{ExprId}} uses a UUID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24536) Query with nonsensical LIMIT hits AssertionError

2018-07-18 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-24536:
--
Labels: beginner spree  (was: beginner)

> Query with nonsensical LIMIT hits AssertionError
> 
>
> Key: SPARK-24536
> URL: https://issues.apache.org/jira/browse/SPARK-24536
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Alexander Behm
>Priority: Trivial
>  Labels: beginner, spree
>
> SELECT COUNT(1) FROM t LIMIT CAST(NULL AS INT)
> fails in the QueryPlanner with:
> {code}
> java.lang.AssertionError: assertion failed: No plan for GlobalLimit null
> {code}
> I think this issue should be caught earlier during semantic analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24757) Improve error message for broadcast timeouts

2018-07-07 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24757.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 2.4.0

> Improve error message for broadcast timeouts
> 
>
> Key: SPARK-24757
> URL: https://issues.apache.org/jira/browse/SPARK-24757
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
> Fix For: 2.4.0
>
>
> Currently, the TimeoutException that is thrown on broadcast joins doesn't 
> give any clues to user how to resolve the issue. Need to provide such help to 
> users by pointing out two config parameters: *spark.sql.broadcastTimeout* and 
> *spark.sql.autoBroadcastJoinThreshold*.
> The ticket aims to handle the TimeoutException there: 
> https://github.com/apache/spark/blob/b7a036b75b8a1d287ac014b85e90d555753064c9/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala#L143



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24704) The order of stages in the DAG graph is incorrect

2018-07-04 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24704.
---
   Resolution: Fixed
 Assignee: StanZhai
Fix Version/s: 2.4.0

> The order of stages in the DAG graph is incorrect
> -
>
> Key: SPARK-24704
> URL: https://issues.apache.org/jira/browse/SPARK-24704
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0, 2.3.1
>Reporter: StanZhai
>Assignee: StanZhai
>Priority: Minor
>  Labels: regression
> Fix For: 2.4.0
>
> Attachments: WX20180630-161907.png
>
>
> The regression is introduced by Spark 2.3.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24648) SQLMetrics counters are not thread safe

2018-06-25 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24648.
---
   Resolution: Fixed
 Assignee: Stacy Kerkela
Fix Version/s: 2.4.0

> SQLMetrics counters are not thread safe
> ---
>
> Key: SPARK-24648
> URL: https://issues.apache.org/jira/browse/SPARK-24648
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.3.1
>Reporter: Stacy Kerkela
>Assignee: Stacy Kerkela
>Priority: Minor
> Fix For: 2.4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The += operator is not atomic, so for broadcast hash joins there have been 
> discrepancies observed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24575) Prohibit window expressions inside WHERE and HAVING clauses

2018-06-20 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24575.
---
Resolution: Fixed
  Assignee: Anton Okolnychyi

> Prohibit window expressions inside WHERE and HAVING clauses
> ---
>
> Key: SPARK-24575
> URL: https://issues.apache.org/jira/browse/SPARK-24575
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Minor
> Fix For: 2.4.0
>
>
> Why window functions inside WHERE and HAVING clauses should be prohibited is 
> described 
> [here|https://stackoverflow.com/questions/13997177/why-no-windowed-functions-in-where-clauses].
> Spark, on the other hand, does not handle this explicitly and will fail with 
> non-descriptive exceptions.
> {code}
> val df = Seq((1, 2), (1, 3), (2, 4), (5, 5)).toDF("a", "b")
> df.createTempView("t1")
> spark.sql("SELECT t1.a FROM t1 WHERE RANK() OVER(ORDER BY t1.b) = 
> 1").show(false)
> {code}
> {noformat}
> Exception in thread "main" java.lang.UnsupportedOperationException: Cannot 
> evaluate expression: rank(input[1, int, false]) windowspecdefinition(input[1, 
> int, false] ASC NULLS FIRST, specifiedwindowframe(RowFrame, 
> unboundedpreceding$(), currentrow$()))
>   at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable$class.doGenCode(Expression.scala:261)
>   at 
> org.apache.spark.sql.catalyst.expressions.WindowExpression.doGenCode(windowExpressions.scala:278)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:108)
>   at 
> org.apache.spark.sql.catalyst.expressions.Expression$$anonfun$genCode$2.apply(Expression.scala:105)
>   at scala.Option.getOrElse(Option.scala:121)
>   ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-21743) top-most limit should not cause memory leak

2018-06-15 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-21743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reopened SPARK-21743:
---

Reopening issue, this is causing a regression in the CSV reader.

> top-most limit should not cause memory leak
> ---
>
> Key: SPARK-21743
> URL: https://issues.apache.org/jira/browse/SPARK-21743
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-24500) UnsupportedOperationException when trying to execute Union plan with Stream of children

2018-06-11 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-24500:
-

Assignee: Herman van Hovell

> UnsupportedOperationException when trying to execute Union plan with Stream 
> of children
> ---
>
> Key: SPARK-24500
> URL: https://issues.apache.org/jira/browse/SPARK-24500
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bogdan Raducanu
>Assignee: Herman van Hovell
>Priority: Major
>
> To reproduce:
> {code}
> import org.apache.spark.sql.catalyst.plans.logical._
> def range(i: Int) = Range(1, i, 1, 1)
> val union = Union(Stream(range(3), range(5), range(7)))
> spark.sessionState.planner.plan(union).next().execute()
> {code}
> produces
> {code}
> java.lang.UnsupportedOperationException
>   at 
> org.apache.spark.sql.execution.PlanLater.doExecute(SparkStrategies.scala:55)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
> {code}
> The SparkPlan looks like this:
> {code}
> :- Range (1, 3, step=1, splits=1)
> :- PlanLater Range (1, 5, step=1, splits=Some(1))
> +- PlanLater Range (1, 7, step=1, splits=Some(1))
> {code}
> So not all of it was planned (some PlanLater still in there).
> This appears to be a longstanding issue.
> I traced it to the use of var in TreeNode.
> For example in mapChildren:
> {code}
> case args: Traversable[_] => args.map {
>   case arg: TreeNode[_] if containsChild(arg) =>
> val newChild = f(arg.asInstanceOf[BaseType])
> if (!(newChild fastEquals arg)) {
>   changed = true
> {code}
> If args is a Stream then changed will never be set here, ultimately causing 
> the method to return the original plan.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24119) Add interpreted execution to SortPrefix expression

2018-06-08 Thread Herman van Hovell (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24119.
---
   Resolution: Fixed
 Assignee: Bruce Robbins
Fix Version/s: 2.4.0

> Add interpreted execution to SortPrefix expression
> --
>
> Key: SPARK-24119
> URL: https://issues.apache.org/jira/browse/SPARK-24119
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
> Fix For: 2.4.0
>
>
> [~hvanhovell] [~kiszk]
> I noticed SortPrefix did not support interpreted execution when I was testing 
> the PR for SPARK-24043. Somehow it was not covered by the umbrella Jira for 
> adding interpreted execution (SPARK-23580)
> Since I had to implement interpreted execution for SortPrefix to complete 
> testing, I am creating this Jira. If there's no good reason why eval wasn't 
> implemented, I will make the PR in a few days.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24076) very bad performance when shuffle.partition = 8192

2018-05-08 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24076.
---
   Resolution: Fixed
 Assignee: yucai
Fix Version/s: 2.4.0

> very bad performance when shuffle.partition = 8192
> --
>
> Key: SPARK-24076
> URL: https://issues.apache.org/jira/browse/SPARK-24076
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
> Fix For: 2.4.0
>
> Attachments: image-2018-04-25-14-29-39-958.png, p1.png, p2.png
>
>
> We see very bad performance when shuffle.partition = 8192 on some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24043) InterpretedPredicate.eval fails if expression tree contains Nondeterministic expressions

2018-05-07 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-24043.
---
   Resolution: Fixed
 Assignee: Bruce Robbins
Fix Version/s: 2.4.0

> InterpretedPredicate.eval fails if expression tree contains Nondeterministic 
> expressions
> 
>
> Key: SPARK-24043
> URL: https://issues.apache.org/jira/browse/SPARK-24043
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Bruce Robbins
>Assignee: Bruce Robbins
>Priority: Minor
> Fix For: 2.4.0
>
>
> When whole-stage codegen and predicate codegen both fail, FilterExec falls 
> back to using InterpretedPredicate. If the predicate's expression contains 
> any non-deterministic expressions, the evaluation throws an error:
> {noformat}
> scala> val df = Seq((1)).toDF("a")
> df: org.apache.spark.sql.DataFrame = [a: int]
> scala> df.filter('a > 0).show // this works fine
> 2018-04-21 20:39:26 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (value#1 > 0)
> +---+
> |  a|
> +---+
> |  1|
> +---+
> scala> df.filter('a > rand(7)).show // this will throw an error
> 2018-04-21 20:39:40 WARN  FilterExec:66 - Codegen disabled for this 
> expression:
>  (cast(value#1 as double) > rand(7))
> 2018-04-21 20:39:40 ERROR Executor:91 - Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:326)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
> {noformat}
> This is because no code initializes the Nondeterministic expressions before 
> eval is called on them.
> This is a low impact issue, since it would require both whole-stage codegen 
> and predicate codegen to fail before FilterExec would fall back to using 
> InterpretedPredicate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16406) Reference resolution for large number of columns should be faster

2018-05-07 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-16406.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

> Reference resolution for large number of columns should be faster
> -
>
> Key: SPARK-16406
> URL: https://issues.apache.org/jira/browse/SPARK-16406
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
> Fix For: 2.4.0
>
>
> Resolving columns in a LogicalPlan on average takes n / 2 (n being the number 
> of columns). This gets problematic as soon as you try to resolve a large 
> number of columns (m) on a large table: O(m * n / 2)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24133) Reading Parquet files containing large strings can fail with java.lang.ArrayIndexOutOfBoundsException

2018-05-03 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-24133:
--
Fix Version/s: 2.3.1

> Reading Parquet files containing large strings can fail with 
> java.lang.ArrayIndexOutOfBoundsException
> -
>
> Key: SPARK-24133
> URL: https://issues.apache.org/jira/browse/SPARK-24133
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ala Luszczak
>Assignee: Ala Luszczak
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> ColumnVectors store string data in one big byte array. Since the array size 
> is capped at just under Integer.MAX_VALUE, a single ColumnVector cannot store 
> more than 2GB of string data.
> However, since the Parquet files commonly contain large blobs stored as 
> strings, and ColumnVectors by default carry 4096 values, it's entirely 
> possible to go past that limit.
> In such cases a negative capacity is requested from 
> WritableColumnVector.reserve(). The call succeeds (requested capacity is 
> smaller than already allocated), and consequently  
> java.lang.ArrayIndexOutOfBoundsException is thrown when the reader actually 
> attempts to put the data into the array.
> This behavior is hard to troubleshoot for the users. Spark should instead 
> check for negative requested capacity in WritableColumnVector.reserve() and 
> throw more informative error, instructing the user to tweak ColumnarBatch 
> size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24051) Incorrect results for certain queries using Java and Python APIs on Spark 2.3.0

2018-04-24 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450293#comment-16450293
 ] 

Herman van Hovell commented on SPARK-24051:
---

[~mgaido] do you have any idea why this is failing in Spark 2.3 specifically? 
Does it have something to do with introduction of analysis barriers?

> Incorrect results for certain queries using Java and Python APIs on Spark 
> 2.3.0
> ---
>
> Key: SPARK-24051
> URL: https://issues.apache.org/jira/browse/SPARK-24051
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Emlyn Corrin
>Priority: Major
>
> I'm seeing Spark 2.3.0 return incorrect results for a certain (very specific) 
> query, demonstrated by the Java program below. It was simplified from a much 
> more complex query, but I'm having trouble simplifying it further without 
> removing the erroneous behaviour.
> {code:java}
> package sparktest;
> import org.apache.spark.SparkConf;
> import org.apache.spark.sql.*;
> import org.apache.spark.sql.expressions.Window;
> import org.apache.spark.sql.types.DataTypes;
> import org.apache.spark.sql.types.Metadata;
> import org.apache.spark.sql.types.StructField;
> import org.apache.spark.sql.types.StructType;
> import java.util.Arrays;
> public class Main {
> public static void main(String[] args) {
> SparkConf conf = new SparkConf()
> .setAppName("SparkTest")
> .setMaster("local[*]");
> SparkSession session = 
> SparkSession.builder().config(conf).getOrCreate();
> Row[] arr1 = new Row[]{
> RowFactory.create(1, 42),
> RowFactory.create(2, 99)};
> StructType sch1 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty()),
> new StructField("b", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds1 = session.createDataFrame(Arrays.asList(arr1), sch1);
> ds1.show();
> Row[] arr2 = new Row[]{
> RowFactory.create(3)};
> StructType sch2 = new StructType(new StructField[]{
> new StructField("a", DataTypes.IntegerType, true, 
> Metadata.empty())});
> Dataset ds2 = session.createDataFrame(Arrays.asList(arr2), sch2)
> .withColumn("b", functions.lit(0));
> ds2.show();
> Column[] cols = new Column[]{
> new Column("a"),
> new Column("b").as("b"),
> functions.count(functions.lit(1))
> .over(Window.partitionBy())
> .as("n")};
> Dataset ds = ds1
> .select(cols)
> .union(ds2.select(cols))
> .where(new Column("n").geq(1))
> .drop("n");
> ds.show();
> //ds.explain(true);
> }
> }
> {code}
> It just calculates the union of 2 datasets,
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> +---+---+
> {code}
> with
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  3|  0|
> +---+---+
> {code}
> The expected result is:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1| 42|
> |  2| 99|
> |  3|  0|
> +---+---+
> {code}
> but instead it prints:
> {code:java}
> +---+---+
> |  a|  b|
> +---+---+
> |  1|  0|
> |  2|  0|
> |  3|  0|
> +---+---+
> {code}
> notice how the value in column c is always zero, overriding the original 
> values in rows 1 and 2.
>  Making seemingly trivial changes, such as replacing {{new 
> Column("b").as("b"),}} with just {{new Column("b"),}} or removing the 
> {{where}} clause after the union, make it behave correctly again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23589) Add interpreted execution for ExternalMapToCatalyst expression

2018-04-23 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23589.
---
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.4.0

> Add interpreted execution for ExternalMapToCatalyst expression
> --
>
> Key: SPARK-23589
> URL: https://issues.apache.org/jira/browse/SPARK-23589
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23595) Add interpreted execution for ValidateExternalType expression

2018-04-20 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23595.
---
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.4.0

> Add interpreted execution for ValidateExternalType expression
> -
>
> Key: SPARK-23595
> URL: https://issues.apache.org/jira/browse/SPARK-23595
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-22362) Add unit test for Window Aggregate Functions

2018-04-19 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-22362.
---
   Resolution: Fixed
 Assignee: Attila Zsolt Piros
Fix Version/s: 2.4.0

> Add unit test for Window Aggregate Functions
> 
>
> Key: SPARK-22362
> URL: https://issues.apache.org/jira/browse/SPARK-22362
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Jiang Xingbo
>Assignee: Attila Zsolt Piros
>Priority: Major
> Fix For: 2.4.0
>
>
> * Declarative
> * Imperative
> * UDAF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23976) UTF8String.concat() or ByteArray.concat() may allocate shorter structure.

2018-04-19 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23976.
---
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.4.0

> UTF8String.concat() or ByteArray.concat() may allocate shorter structure.
> -
>
> Key: SPARK-23976
> URL: https://issues.apache.org/jira/browse/SPARK-23976
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 2.4.0
>
>
> When the three inputs has `0x7FFF_FF00`, `0x7FFF_FF00`, and `0xE00`, the 
> current algorithm allocate the result structure with 0x1000 length due to 
> integer sum overflow.
> We should detect overflow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23989) When using `SortShuffleWriter`, the data will be overwritten

2018-04-19 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23989.
---
   Resolution: Fixed
 Assignee: Wenchen Fan
Fix Version/s: 2.4.0
   2.3.1

> When using `SortShuffleWriter`, the data will be overwritten
> 
>
> Key: SPARK-23989
> URL: https://issues.apache.org/jira/browse/SPARK-23989
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: liuxian
>Assignee: Wenchen Fan
>Priority: Critical
> Fix For: 2.3.1, 2.4.0
>
>
> {color:#33}When using `SortShuffleWriter`, we only insert  
> '{color}{color:#cc7832}AnyRef{color}{color:#33}' into 
> '{color}PartitionedAppendOnlyMap{color:#33}' or 
> '{color}PartitionedPairBuffer{color:#33}'.{color}
> {color:#33}For this function:{color}
> {color:#cc7832}override def {color}{color:#ffc66d}write{color}(records: 
> {color:#4e807d}Iterator{color}[Product2[{color:#4e807d}K{color}{color:#cc7832},
>  {color}{color:#4e807d}V{color}]])
> the value of 'records' is `UnsafeRow`, so  the value will be overwritten
> {color:#33} {color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23584) Add interpreted execution to NewInstance expression

2018-04-19 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23584.
---
   Resolution: Fixed
 Assignee: Takeshi Yamamuro
Fix Version/s: 2.4.0

> Add interpreted execution to NewInstance expression
> ---
>
> Key: SPARK-23584
> URL: https://issues.apache.org/jira/browse/SPARK-23584
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23588) Add interpreted execution for CatalystToExternalMap expression

2018-04-19 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23588.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add interpreted execution for CatalystToExternalMap expression
> --
>
> Key: SPARK-23588
> URL: https://issues.apache.org/jira/browse/SPARK-23588
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Takeshi Yamamuro
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-19 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443934#comment-16443934
 ] 

Herman van Hovell commented on SPARK-23711:
---

Can you make a small PR initially so we can discuss the design a bit?

> Add fallback to interpreted execution logic
> ---
>
> Key: SPARK-23711
> URL: https://issues.apache.org/jira/browse/SPARK-23711
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-19 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443882#comment-16443882
 ] 

Herman van Hovell commented on SPARK-23711:
---

ObjectHashAggregateExec uses code generations for the projections it uses.

I think we as a rule should never create code generated objects directly, and 
use factories with fallback logic instead.

> Add fallback to interpreted execution logic
> ---
>
> Key: SPARK-23711
> URL: https://issues.apache.org/jira/browse/SPARK-23711
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23711) Add fallback to interpreted execution logic

2018-04-19 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443866#comment-16443866
 ] 

Herman van Hovell commented on SPARK-23711:
---

There a lot of places where we do not fallback to interpreted mode and just 
fail, for example: window functions, object hash aggregate, encoders, etc...

> Add fallback to interpreted execution logic
> ---
>
> Key: SPARK-23711
> URL: https://issues.apache.org/jira/browse/SPARK-23711
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23875) Create IndexedSeq wrapper for ArrayData

2018-04-17 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23875.
---
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 2.4.0

> Create IndexedSeq wrapper for ArrayData
> ---
>
> Key: SPARK-23875
> URL: https://issues.apache.org/jira/browse/SPARK-23875
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> We don't have a good way to sequentially access {{UnsafeArrayData}} with a 
> common interface such as Seq. An example is {{MapObject}} where we need to 
> access several sequence collection types together. But {{UnsafeArrayData}} 
> doesn't implement {{ArrayData.array}}. Calling {{toArray}} will copy the 
> entire array. We can provide an {{IndexedSeq}} wrapper for {{ArrayData}}, so 
> we can avoid copying the entire array.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23873) Use accessors in interpreted LambdaVariable

2018-04-16 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23873.
---
   Resolution: Fixed
 Assignee: Liang-Chi Hsieh
Fix Version/s: 2.4.0

> Use accessors in interpreted LambdaVariable
> ---
>
> Key: SPARK-23873
> URL: https://issues.apache.org/jira/browse/SPARK-23873
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently, interpreted execution of {{LambdaVariable}} just uses 
> {{InternalRow.get}} to access element. We should use specified accessors if 
> possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23864) Add Unsafe* copy methods to UnsafeWriter

2018-04-10 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23864.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add Unsafe* copy methods to UnsafeWriter
> 
>
> Key: SPARK-23864
> URL: https://issues.apache.org/jira/browse/SPARK-23864
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23951) Use java classed in ExprValue and simplify a bunch of stuff

2018-04-10 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-23951:
-

 Summary: Use java classed in ExprValue and simplify a bunch of 
stuff
 Key: SPARK-23951
 URL: https://issues.apache.org/jira/browse/SPARK-23951
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23945) Column.isin() should accept a single-column DataFrame as input

2018-04-10 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432039#comment-16432039
 ] 

Herman van Hovell commented on SPARK-23945:
---

[~nchammas] we didn't add explicit dataset support because no-one asked for it, 
until now :)

What do you want to support here? {{(NOT) IN}} and {{EXISTS}}? Or do you also 
want to add support for scalar subqueries, and subqueries in filters?

> Column.isin() should accept a single-column DataFrame as input
> --
>
> Key: SPARK-23945
> URL: https://issues.apache.org/jira/browse/SPARK-23945
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> In SQL you can filter rows based on the result of a subquery:
> {code:java}
> SELECT *
> FROM table1
> WHERE name NOT IN (
> SELECT name
> FROM table2
> );{code}
> In the Spark DataFrame API, the equivalent would probably look like this:
> {code:java}
> (table1
> .where(
> ~col('name').isin(
> table2.select('name')
> )
> )
> ){code}
> However, .isin() currently [only accepts a local list of 
> values|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.isin].
> I imagine making this enhancement would happen as part of a larger effort to 
> support correlated subqueries in the DataFrame API.
> Or perhaps there is no plan to support this style of query in the DataFrame 
> API, and queries like this should instead be written in a different way? How 
> would we write a query like the one I have above in the DataFrame API, 
> without needing to collect values locally for the NOT IN filter?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23897) Guava version

2018-04-09 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430301#comment-16430301
 ] 

Herman van Hovell commented on SPARK-23897:
---

It is currently being discussed on the mailing list: 
http://apache-spark-developers-list.1001551.n3.nabble.com/time-for-Apache-Spark-3-0-td23755.html

> Guava version
> -
>
> Key: SPARK-23897
> URL: https://issues.apache.org/jira/browse/SPARK-23897
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sercan Karaoglu
>Priority: Minor
>
> Guava dependency version 14 is pretty old, needs to be updated to at least 
> 16, google cloud storage connector uses newer one which causes pretty popular 
> error with guava; "java.lang.NoSuchMethodError: 
> com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;"
>  and causes app to crash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23908) High-order function: transform(array, function<T, U>) → array

2018-04-09 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-23908:
-

Assignee: Herman van Hovell

> High-order function: transform(array, function) → array
> ---
>
> Key: SPARK-23908
> URL: https://issues.apache.org/jira/browse/SPARK-23908
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Herman van Hovell
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns an array that is the result of applying function to each element of 
> array:
> {noformat}
> SELECT transform(ARRAY [], x -> x + 1); -- []
> SELECT transform(ARRAY [5, 6], x -> x + 1); -- [6, 7]
> SELECT transform(ARRAY [5, NULL, 6], x -> COALESCE(x, 0) + 1); -- [6, 1, 7]
> SELECT transform(ARRAY ['x', 'abc', 'z'], x -> x || '0'); -- ['x0', 'abc0', 
> 'z0']
> SELECT transform(ARRAY [ARRAY [1, NULL, 2], ARRAY[3, NULL]], a -> filter(a, x 
> -> x IS NOT NULL)); -- [[1, 2], [3]]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23911) High-order function: reduce(array, initialState S, inputFunction<S, T, S>, outputFunction<S, R>) → R

2018-04-09 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-23911:
-

Assignee: Herman van Hovell

> High-order function: reduce(array, initialState S, inputFunction, 
> outputFunction) → R
> ---
>
> Key: SPARK-23911
> URL: https://issues.apache.org/jira/browse/SPARK-23911
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Herman van Hovell
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Returns a single value reduced from array. inputFunction will be invoked for 
> each element in array in order. In addition to taking the element, 
> inputFunction takes the current state, initially initialState, and returns 
> the new state. outputFunction will be invoked to turn the final state into 
> the result value. It may be the identity function (i -> i).
> {noformat}
> SELECT reduce(ARRAY [], 0, (s, x) -> s + x, s -> s); -- 0
> SELECT reduce(ARRAY [5, 20, 50], 0, (s, x) -> s + x, s -> s); -- 75
> SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + x, s -> s); -- NULL
> SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> s + COALESCE(x, 0), s -> 
> s); -- 75
> SELECT reduce(ARRAY [5, 20, NULL, 50], 0, (s, x) -> IF(x IS NULL, s, s + x), 
> s -> s); -- 75
> SELECT reduce(ARRAY [2147483647, 1], CAST (0 AS BIGINT), (s, x) -> s + x, s 
> -> s); -- 2147483648
> SELECT reduce(ARRAY [5, 6, 10, 20], -- calculates arithmetic average: 10.25
>   CAST(ROW(0.0, 0) AS ROW(sum DOUBLE, count INTEGER)),
>   (s, x) -> CAST(ROW(x + s.sum, s.count + 1) AS ROW(sum DOUBLE, 
> count INTEGER)),
>   s -> IF(s.count = 0, NULL, s.sum / s.count));
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-23909) High-order function: filter(array, function<T, boolean>) → array

2018-04-09 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reassigned SPARK-23909:
-

Assignee: Herman van Hovell

> High-order function: filter(array, function) → array
> --
>
> Key: SPARK-23909
> URL: https://issues.apache.org/jira/browse/SPARK-23909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Herman van Hovell
>Priority: Major
>
> Ref: https://prestodb.io/docs/current/functions/array.html
> Constructs an array from those elements of array for which function returns 
> true:
> {noformat}
> SELECT filter(ARRAY [], x -> true); -- []
> SELECT filter(ARRAY [5, -6, NULL, 7], x -> x > 0); -- [5, 7]
> SELECT filter(ARRAY [5, NULL, 7, NULL], x -> x IS NOT NULL); -- [5, 7]
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23897) Guava version

2018-04-08 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429873#comment-16429873
 ] 

Herman van Hovell commented on SPARK-23897:
---

That is not going to happen for a minor release, since people (unfortunately) 
rely on this dependency. There are plans to shade all dependencies in Spark 
3.0, but that is at least 6 months away.

> Guava version
> -
>
> Key: SPARK-23897
> URL: https://issues.apache.org/jira/browse/SPARK-23897
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sercan Karaoglu
>Priority: Minor
>
> Guava dependency version 14 is pretty old, needs to be updated to at least 
> 16, google cloud storage connector uses newer one which causes pretty popular 
> error with guava; "java.lang.NoSuchMethodError: 
> com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;"
>  and causes app to crash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23897) Guava version

2018-04-08 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429866#comment-16429866
 ] 

Herman van Hovell commented on SPARK-23897:
---

This is a duplicate or SPARK-23854.

We are not going to upgrade Guava any time soon. This is notoriously hard to do 
because it is used in a lot Spark's dependencies and the guava developers 
aggressively remove deprecated APIs; updating can easily break stuff (missing 
methods that sort of thing). See the discussion in 
[https://github.com/apache/spark/pull/20966]  for some more context.

> Guava version
> -
>
> Key: SPARK-23897
> URL: https://issues.apache.org/jira/browse/SPARK-23897
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sercan Karaoglu
>Priority: Minor
>
> Guava dependency version 14 is pretty old, needs to be updated to at least 
> 16, google cloud storage connector uses newer one which causes pretty popular 
> error with guava; "java.lang.NoSuchMethodError: 
> com.google.common.base.Splitter.splitToList(Ljava/lang/CharSequence;)Ljava/util/List;"
>  and causes app to crash



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-23898) Simplify code generation for Add/Subtract with CalendarIntervals

2018-04-08 Thread Herman van Hovell (JIRA)

Herman van Hovell created SPARK-23898:
-

 Summary: Simplify code generation for Add/Subtract with 
CalendarIntervals
 Key: SPARK-23898
 URL: https://issues.apache.org/jira/browse/SPARK-23898
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23893) Possible overflow in long = int * int

2018-04-08 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23893.
---
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.4.0

> Possible overflow in long = int * int
> -
>
> Key: SPARK-23893
> URL: https://issues.apache.org/jira/browse/SPARK-23893
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 2.4.0
>
>
> To perform `int * int` and then to cast to `long` may cause overflow if the 
> MSB of the multiplication result is `1`. In other words, the result would be 
> negative due to sign extension.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23892) Improve coverage and fix lint error in UTF8String-related Suite

2018-04-08 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23892.
---
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.4.0

> Improve coverage and fix lint error in UTF8String-related Suite
> ---
>
> Key: SPARK-23892
> URL: https://issues.apache.org/jira/browse/SPARK-23892
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 2.4.0
>
>
> The following code in {{UTF8StringSuite}} has no sense.
> {code}
> assertTrue(s1.startsWith(s1));
> assertTrue(s1.endsWith(s1));
> {code}
> The code {{if (length <= 0) ""}} in {{UTF8StringPropertyCheckSuite}} has no 
> sense
> {code}
>   test("lpad, rpad") {
> def padding(origin: String, pad: String, length: Int, isLPad: Boolean): 
> String = {
>   if (length <= 0) return ""
>   if (length <= origin.length) {
> if (length <= 0) "" else origin.substring(0, length)
>   } else {
>...
> {code}
> The previous change in {{UTF8StringSuite}} broke lint-java check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23882) Is UTF8StringSuite.writeToOutputStreamUnderflow() supported?

2018-04-06 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23882.
---
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.4.0

> Is UTF8StringSuite.writeToOutputStreamUnderflow() supported?
> 
>
> Key: SPARK-23882
> URL: https://issues.apache.org/jira/browse/SPARK-23882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
>Priority: Minor
> Fix For: 2.4.0
>
>
> The unit test {{UTF8StringSuite.writeToOutputStreamUnderflow()}} accesses 
> metadata of an Java byte array objected where {{Platform.BYTE_ARRAY_OFFSET}} 
> reserves.
> Is this test valid? Is this test necessary for Spark implementation?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23864) Add Unsafe* copy methods to UnsafeWriter

2018-04-05 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell updated SPARK-23864:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-23580

> Add Unsafe* copy methods to UnsafeWriter
> 
>
> Key: SPARK-23864
> URL: https://issues.apache.org/jira/browse/SPARK-23864
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23582) Add interpreted execution to StaticInvoke expression

2018-04-05 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23582.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

> Add interpreted execution to StaticInvoke expression
> 
>
> Key: SPARK-23582
> URL: https://issues.apache.org/jira/browse/SPARK-23582
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Kazuaki Ishizaki
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23593) Add interpreted execution for InitializeJavaBean expression

2018-04-05 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-23593.
---
Resolution: Fixed

> Add interpreted execution for InitializeJavaBean expression
> ---
>
> Key: SPARK-23593
> URL: https://issues.apache.org/jira/browse/SPARK-23593
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Herman van Hovell
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1244 matches

Mail list logo