date:20181111

[jira] [Created] (SPARK-26005) Upgrade ANTRL to 4.7.1

2018-11-11 Thread Xiao Li (JIRA)

Xiao Li created SPARK-26005:
---

 Summary: Upgrade ANTRL to 4.7.1
 Key: SPARK-26005
 URL: https://issues.apache.org/jira/browse/SPARK-26005
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Xiao Li
Assignee: Xiao Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26005) Upgrade ANTRL to 4.7.1

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682799#comment-16682799
 ] 

Apache Spark commented on SPARK-26005:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/23005

> Upgrade ANTRL to 4.7.1
> --
>
> Key: SPARK-26005
> URL: https://issues.apache.org/jira/browse/SPARK-26005
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26005) Upgrade ANTRL to 4.7.1

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26005:


Assignee: Xiao Li  (was: Apache Spark)

> Upgrade ANTRL to 4.7.1
> --
>
> Key: SPARK-26005
> URL: https://issues.apache.org/jira/browse/SPARK-26005
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26005) Upgrade ANTRL to 4.7.1

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26005:


Assignee: Apache Spark  (was: Xiao Li)

> Upgrade ANTRL to 4.7.1
> --
>
> Key: SPARK-26005
> URL: https://issues.apache.org/jira/browse/SPARK-26005
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26005) Upgrade ANTRL to 4.7.1

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682800#comment-16682800
 ] 

Apache Spark commented on SPARK-26005:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/23005

> Upgrade ANTRL to 4.7.1
> --
>
> Key: SPARK-26005
> URL: https://issues.apache.org/jira/browse/SPARK-26005
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26006) mllib Prefixspan

2018-11-11 Thread idan Levi (JIRA)

idan Levi created SPARK-26006:
-

 Summary: mllib Prefixspan
 Key: SPARK-26006
 URL: https://issues.apache.org/jira/browse/SPARK-26006
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 2.3.0
 Environment: Unit test running on windows
Reporter: idan Levi


Mllib's Prefixspan - run method - cached RDD stays in cache. 

val dataInternalRepr = toDatabaseInternalRepr(data, itemToInt)
 .persist(StorageLevel.MEMORY_AND_DISK)

After run is comlpeted , rdd remain in cache.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26007) DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord

2018-11-11 Thread Maxim Gekk (JIRA)

Maxim Gekk created SPARK-26007:
--

 Summary: DataFrameReader.csv() should respect to 
spark.sql.columnNameOfCorruptRecord
 Key: SPARK-26007
 URL: https://issues.apache.org/jira/browse/SPARK-26007
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Maxim Gekk


The csv() method of DataFrameReader doesn't take into account the SQL config 
spark.sql.columnNameOfCorruptRecord while creating an instance of CSVOptions:
https://github.com/apache/spark/blob/2d085c13b7f715dbff23dd1f81af45ff903d1a79/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L491-L494

This should be fixed by passing 
sparkSession.sessionState.conf.columnNameOfCorruptRecord as a constructor 
parameter to CSVOptions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26007) DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682856#comment-16682856
 ] 

Apache Spark commented on SPARK-26007:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/23006

> DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord
> ---
>
> Key: SPARK-26007
> URL: https://issues.apache.org/jira/browse/SPARK-26007
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The csv() method of DataFrameReader doesn't take into account the SQL config 
> spark.sql.columnNameOfCorruptRecord while creating an instance of CSVOptions:
> https://github.com/apache/spark/blob/2d085c13b7f715dbff23dd1f81af45ff903d1a79/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L491-L494
> This should be fixed by passing 
> sparkSession.sessionState.conf.columnNameOfCorruptRecord as a constructor 
> parameter to CSVOptions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26007) DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682855#comment-16682855
 ] 

Apache Spark commented on SPARK-26007:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/23006

> DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord
> ---
>
> Key: SPARK-26007
> URL: https://issues.apache.org/jira/browse/SPARK-26007
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The csv() method of DataFrameReader doesn't take into account the SQL config 
> spark.sql.columnNameOfCorruptRecord while creating an instance of CSVOptions:
> https://github.com/apache/spark/blob/2d085c13b7f715dbff23dd1f81af45ff903d1a79/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L491-L494
> This should be fixed by passing 
> sparkSession.sessionState.conf.columnNameOfCorruptRecord as a constructor 
> parameter to CSVOptions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26007) DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26007:


Assignee: (was: Apache Spark)

> DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord
> ---
>
> Key: SPARK-26007
> URL: https://issues.apache.org/jira/browse/SPARK-26007
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The csv() method of DataFrameReader doesn't take into account the SQL config 
> spark.sql.columnNameOfCorruptRecord while creating an instance of CSVOptions:
> https://github.com/apache/spark/blob/2d085c13b7f715dbff23dd1f81af45ff903d1a79/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L491-L494
> This should be fixed by passing 
> sparkSession.sessionState.conf.columnNameOfCorruptRecord as a constructor 
> parameter to CSVOptions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26007) DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26007:


Assignee: Apache Spark

> DataFrameReader.csv() should respect to spark.sql.columnNameOfCorruptRecord
> ---
>
> Key: SPARK-26007
> URL: https://issues.apache.org/jira/browse/SPARK-26007
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Minor
>
> The csv() method of DataFrameReader doesn't take into account the SQL config 
> spark.sql.columnNameOfCorruptRecord while creating an instance of CSVOptions:
> https://github.com/apache/spark/blob/2d085c13b7f715dbff23dd1f81af45ff903d1a79/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L491-L494
> This should be fixed by passing 
> sparkSession.sessionState.conf.columnNameOfCorruptRecord as a constructor 
> parameter to CSVOptions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25972) Missed JSON options in streaming.py

2018-11-11 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25972.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22973
[https://github.com/apache/spark/pull/22973]

> Missed JSON options in streaming.py 
> 
>
> Key: SPARK-25972
> URL: https://issues.apache.org/jira/browse/SPARK-25972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
> Fix For: 3.0.0
>
>
> streaming.py misses JSON options comparing to readwrite.py:
> - dropFieldIfAllNull
> - encoding



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25972) Missed JSON options in streaming.py

2018-11-11 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-25972:


Assignee: Maxim Gekk

> Missed JSON options in streaming.py 
> 
>
> Key: SPARK-25972
> URL: https://issues.apache.org/jira/browse/SPARK-25972
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
>
> streaming.py misses JSON options comparing to readwrite.py:
> - dropFieldIfAllNull
> - encoding



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25914) Separate projection from grouping and aggregate in logical Aggregate

2018-11-11 Thread Hyukjin Kwon (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-25914:
-
Target Version/s:   (was: 3.0.0)

> Separate projection from grouping and aggregate in logical Aggregate
> 
>
> Key: SPARK-25914
> URL: https://issues.apache.org/jira/browse/SPARK-25914
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maryann Xue
>Assignee: Dilip Biswal
>Priority: Major
>
> Currently the Spark SQL logical Aggregate has two expression fields: 
> {{groupingExpressions}} and {{aggregateExpressions}}, in which 
> {{aggregateExpressions}} is actually the result expressions, or in other 
> words, the project list in the SELECT clause.
>   
>  This would cause an exception while processing the following query:
> {code:java}
> SELECT concat('x', concat(a, 's'))
> FROM testData2
> GROUP BY concat(a, 's'){code}
>  After optimization, the query becomes:
> {code:java}
> SELECT concat('x', a, 's')
> FROM testData2
> GROUP BY concat(a, 's'){code}
> The optimization rule {{CombineConcats}} optimizes the expressions by 
> flattening "concat" and causes the query to fail since the expression 
> {{concat('x', a, 's')}} in the SELECT clause is neither referencing a 
> grouping expression nor a aggregate expression.
>   
>  The problem is that we try to mix two operations in one operator, and worse, 
> in one field: the group-and-aggregate operation and the project operation. 
> There are two ways to solve this problem:
>  1. Break the two operations into two logical operators, which means a 
> group-by query can usually be mapped into a Project-over-Aggregate pattern.
>  2. Break the two operations into multiple fields in the Aggregate operator, 
> the same way we do for physical aggregate classes (e.g., 
> {{HashAggregateExec}}, or {{SortAggregateExec}}). Thus, 
> {{groupingExpressions}} would still be the expressions from the GROUP BY 
> clause (as before), but {{aggregateExpressions}} would contain aggregate 
> functions only, and {{resultExpressions}} would be the project list in the 
> SELECT clause holding references to either {{groupingExpressions}} or 
> {{aggregateExpressions}}.
>   
>  I would say option 1 is even clearer, but it would be more likely to break 
> the pattern matching in existing optimization rules and thus require more 
> changes in the compiler. So we'd probably wanna go with option 2. That said, 
> I suggest we achieve this goal through two iterative steps:
>   
>  Phase 1: Keep the current fields of logical Aggregate as 
> {{groupingExpressions}} and {{aggregateExpressions}}, but change the 
> semantics of {{aggregateExpressions}} by replacing the grouping expressions 
> with corresponding references to expressions in {{groupingExpressions}}. The 
> aggregate expressions in  {{aggregateExpressions}} will remain the same.
>   
>  Phase 2: Add {{resultExpressions}} for the project list, and keep only 
> aggregate expressions in {{aggregateExpressions}}.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25914) Separate projection from grouping and aggregate in logical Aggregate

2018-11-11 Thread Hyukjin Kwon (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682890#comment-16682890
 ] 

Hyukjin Kwon commented on SPARK-25914:
--

Please avoid to set a target version which is usually reserved by committers.

> Separate projection from grouping and aggregate in logical Aggregate
> 
>
> Key: SPARK-25914
> URL: https://issues.apache.org/jira/browse/SPARK-25914
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maryann Xue
>Assignee: Dilip Biswal
>Priority: Major
>
> Currently the Spark SQL logical Aggregate has two expression fields: 
> {{groupingExpressions}} and {{aggregateExpressions}}, in which 
> {{aggregateExpressions}} is actually the result expressions, or in other 
> words, the project list in the SELECT clause.
>   
>  This would cause an exception while processing the following query:
> {code:java}
> SELECT concat('x', concat(a, 's'))
> FROM testData2
> GROUP BY concat(a, 's'){code}
>  After optimization, the query becomes:
> {code:java}
> SELECT concat('x', a, 's')
> FROM testData2
> GROUP BY concat(a, 's'){code}
> The optimization rule {{CombineConcats}} optimizes the expressions by 
> flattening "concat" and causes the query to fail since the expression 
> {{concat('x', a, 's')}} in the SELECT clause is neither referencing a 
> grouping expression nor a aggregate expression.
>   
>  The problem is that we try to mix two operations in one operator, and worse, 
> in one field: the group-and-aggregate operation and the project operation. 
> There are two ways to solve this problem:
>  1. Break the two operations into two logical operators, which means a 
> group-by query can usually be mapped into a Project-over-Aggregate pattern.
>  2. Break the two operations into multiple fields in the Aggregate operator, 
> the same way we do for physical aggregate classes (e.g., 
> {{HashAggregateExec}}, or {{SortAggregateExec}}). Thus, 
> {{groupingExpressions}} would still be the expressions from the GROUP BY 
> clause (as before), but {{aggregateExpressions}} would contain aggregate 
> functions only, and {{resultExpressions}} would be the project list in the 
> SELECT clause holding references to either {{groupingExpressions}} or 
> {{aggregateExpressions}}.
>   
>  I would say option 1 is even clearer, but it would be more likely to break 
> the pattern matching in existing optimization rules and thus require more 
> changes in the compiler. So we'd probably wanna go with option 2. That said, 
> I suggest we achieve this goal through two iterative steps:
>   
>  Phase 1: Keep the current fields of logical Aggregate as 
> {{groupingExpressions}} and {{aggregateExpressions}}, but change the 
> semantics of {{aggregateExpressions}} by replacing the grouping expressions 
> with corresponding references to expressions in {{groupingExpressions}}. The 
> aggregate expressions in  {{aggregateExpressions}} will remain the same.
>   
>  Phase 2: Add {{resultExpressions}} for the project list, and keep only 
> aggregate expressions in {{aggregateExpressions}}.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26008) Structured Streaming Manual clock for simulation

2018-11-11 Thread Tom Bar Yacov (JIRA)

Tom Bar Yacov created SPARK-26008:
-

 Summary: Structured Streaming Manual clock for simulation
 Key: SPARK-26008
 URL: https://issues.apache.org/jira/browse/SPARK-26008
 Project: Spark
  Issue Type: Question
  Components: Structured Streaming
Affects Versions: 2.4.0, 2.3.2, 2.3.1, 2.3.0
Reporter: Tom Bar Yacov


Structured streaming Internal {color:#33}StreamTest{color} class allows to 
test incremental logic and verify outputs between multiple triggers. It support 
changing the internal spark clock to get full deterministic simulation of the 
incremental state and APIs. This is not possible outside tests since 
{color:#33}DataStreamWriter{color} hides the triggerClock parameter and is 
final.

This can be very useful not only in unit test mode but also for a real running 
query. for example when you have all the Kafka historical data persisted to 
hdfs with its Kafka timestamp and you want to "play"  the data and simulate the 
streaming application output as if  running on this data in live streaming 
including incremental output between triggers.

Today I can simulate multiple triggers and incremental logic for some of the 
APIs, but for APIs that depend on the execution clock like 
{color:#33}mapGroupsWithState{color} with execution based timeout I did not 
find a way to do this.

Question is -  Is it a possible to support a similar solution like in 
StreamTest - Allow passing an external manual clock as parameter to 
DataStreamWriter and allowing the user an external control over this clock? 
what possible failures that can occur if running with manual clock in real 
cluster mode?

Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19714) Clarify Bucketizer handling of invalid input

2018-11-11 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-19714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19714.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23003
[https://github.com/apache/spark/pull/23003]

> Clarify Bucketizer handling of invalid input
> 
>
> Key: SPARK-19714
> URL: https://issues.apache.org/jira/browse/SPARK-19714
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.1.0
>Reporter: Bill Chambers
>Assignee: Wojciech Szymanski
>Priority: Minor
> Fix For: 3.0.0
>
>
> {code}
> contDF = spark.range(500).selectExpr("cast(id as double) as id")
> import org.apache.spark.ml.feature.Bucketizer
> val splits = Array(5.0, 10.0, 250.0, 500.0)
> val bucketer = new Bucketizer()
>   .setSplits(splits)
>   .setInputCol("id")
>   .setHandleInvalid("skip")
> bucketer.transform(contDF).show()
> {code}
> You would expect that this would handle the invalid buckets. However it fails
> {code}
> Caused by: org.apache.spark.SparkException: Feature value 0.0 out of 
> Bucketizer bounds [5.0, 500.0].  Check your features, or loosen the 
> lower/upper bound constraints.
> {code} 
> It seems strange that handleInvalud doesn't actually handleInvalid inputs.
> Thoughts anyone?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24421) Accessing sun.misc.Cleaner in JDK11

2018-11-11 Thread Assaf Mendelson (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682915#comment-16682915
 ] 

Assaf Mendelson commented on SPARK-24421:
-

Would it be possible to add {{Add-Opens java.base/java.lang=ALL-UNNAMED to the 
manifest file to avoid the need to do so when running the jar?}}

> Accessing sun.misc.Cleaner in JDK11
> ---
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
>  Labels: release-notes
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24421) Accessing sun.misc.Cleaner in JDK11

2018-11-11 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-24421:
--
Docs Text: 
Provisional release notes text:

Spark 3 attempts to avoid the JVM's default limit on total size of memory 
allocated by direct buffers, for user convenience, by accessing some internal 
JDK classes directly. This is no longer possible in Java 9 and later, because 
of the new module encapsulation system. 

For many usages of Spark, this will not matter, as the default 
{{MaxDirectMemorySize}} may be more than sufficient for all direct buffer 
allocation.

If it isn't, it can be made to work again by allowing the access explicitly 
with the JVM argument. {{--add-opens java.base/java.lang=ALL-UNNAMED}}. Of 
course this can also be resolved by explicitly setting 
{{-XX:MaxDirectMemorySize=}} to a sufficiently large value.

> Accessing sun.misc.Cleaner in JDK11
> ---
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
>  Labels: release-notes
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25914) Separate projection from grouping and aggregate in logical Aggregate

2018-11-11 Thread Xiao Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-25914:

Target Version/s: 3.0.0

> Separate projection from grouping and aggregate in logical Aggregate
> 
>
> Key: SPARK-25914
> URL: https://issues.apache.org/jira/browse/SPARK-25914
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maryann Xue
>Assignee: Dilip Biswal
>Priority: Major
>
> Currently the Spark SQL logical Aggregate has two expression fields: 
> {{groupingExpressions}} and {{aggregateExpressions}}, in which 
> {{aggregateExpressions}} is actually the result expressions, or in other 
> words, the project list in the SELECT clause.
>   
>  This would cause an exception while processing the following query:
> {code:java}
> SELECT concat('x', concat(a, 's'))
> FROM testData2
> GROUP BY concat(a, 's'){code}
>  After optimization, the query becomes:
> {code:java}
> SELECT concat('x', a, 's')
> FROM testData2
> GROUP BY concat(a, 's'){code}
> The optimization rule {{CombineConcats}} optimizes the expressions by 
> flattening "concat" and causes the query to fail since the expression 
> {{concat('x', a, 's')}} in the SELECT clause is neither referencing a 
> grouping expression nor a aggregate expression.
>   
>  The problem is that we try to mix two operations in one operator, and worse, 
> in one field: the group-and-aggregate operation and the project operation. 
> There are two ways to solve this problem:
>  1. Break the two operations into two logical operators, which means a 
> group-by query can usually be mapped into a Project-over-Aggregate pattern.
>  2. Break the two operations into multiple fields in the Aggregate operator, 
> the same way we do for physical aggregate classes (e.g., 
> {{HashAggregateExec}}, or {{SortAggregateExec}}). Thus, 
> {{groupingExpressions}} would still be the expressions from the GROUP BY 
> clause (as before), but {{aggregateExpressions}} would contain aggregate 
> functions only, and {{resultExpressions}} would be the project list in the 
> SELECT clause holding references to either {{groupingExpressions}} or 
> {{aggregateExpressions}}.
>   
>  I would say option 1 is even clearer, but it would be more likely to break 
> the pattern matching in existing optimization rules and thus require more 
> changes in the compiler. So we'd probably wanna go with option 2. That said, 
> I suggest we achieve this goal through two iterative steps:
>   
>  Phase 1: Keep the current fields of logical Aggregate as 
> {{groupingExpressions}} and {{aggregateExpressions}}, but change the 
> semantics of {{aggregateExpressions}} by replacing the grouping expressions 
> with corresponding references to expressions in {{groupingExpressions}}. The 
> aggregate expressions in  {{aggregateExpressions}} will remain the same.
>   
>  Phase 2: Add {{resultExpressions}} for the project list, and keep only 
> aggregate expressions in {{aggregateExpressions}}.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26009) Unable to fetch jar from remote repo while running spark-submit on kubernetes

2018-11-11 Thread Bala Bharath Reddy Resapu (JIRA)

Bala Bharath Reddy Resapu created SPARK-26009:
-

 Summary: Unable to fetch jar from remote repo while running 
spark-submit on kubernetes
 Key: SPARK-26009
 URL: https://issues.apache.org/jira/browse/SPARK-26009
 Project: Spark
  Issue Type: Question
  Components: Kubernetes
Affects Versions: 2.3.2
Reporter: Bala Bharath Reddy Resapu


I am trying to run spark on kubernetes with a docker image. My requirement is 
to download the jar from the external repo while running spark-submit. I am 
able to download the jar using wget in the container but it doesn't work when 
inputting in the spark-submit command. I am not packaging the jar with docker 
image. It works fine when I input the jar file inside the docker image. 

 

./bin/spark-submit \

--master k8s://https://ip:port \

--deploy-mode cluster \

--name test3 \

--class hello \

--conf spark.kubernetes.container.image.pullSecrets=abcd \

--conf spark.kubernetes.container.image=spark:h2.0 \

[https://devops.com/artifactory/local/testing/testing_2.11/h|https://bala.bharath.reddy.resapu%40ibm.com:akcp5bcbktykg2ti28sju4gtebsqwkg2mqkaf9w6g5rdbo3iwrwx7qb1m5dokgd54hdru2...@na.artifactory.swg-devops.com/artifactory/txo-cedp-garage-artifacts-sbt-local/testing/testing_2.11/arithmetic.jar]ello.jar

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24421) Accessing sun.misc.Cleaner in JDK11

2018-11-11 Thread Alan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682951#comment-16682951
 ] 

Alan commented on SPARK-24421:
--

Sean - In the proposed release note text you say that access to the internal 
JDK classes "is no longer possible in Java 9 and later, because of the new 
module encapsulation system". I don't think this is quite right as the main 
issue you ran into is that the JDK's internal cleaner mechanism was refactored 
and moved from sun.misc to jdk.internal.ref. Also the comment about using 
`--add-opens` may need update too as java.lang remains open to code on the 
class path in JDK 9/10/11. I suspect the the comments in this issue about 
`--add-opens` meant to say jdk.internal.ref instead (although you just don't 
want to go there as directly using anything in that package may break at any 
time).

> Accessing sun.misc.Cleaner in JDK11
> ---
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
>  Labels: release-notes
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-24421) Accessing sun.misc.Cleaner in JDK11

2018-11-11 Thread Alan (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682951#comment-16682951
 ] 

Alan edited comment on SPARK-24421 at 11/11/18 6:13 PM:


Sean: In the proposed release note text you say that access to the internal JDK 
classes "is no longer possible in Java 9 and later, because of the new module 
encapsulation system". I don't think this is quite right as the main issue you 
ran into is that the JDK's internal cleaner mechanism was refactored and moved 
from sun.misc to jdk.internal.ref. Also the comment about using add-opens may 
need update too as java.lang remains open to code on the class path in JDK 
9/10/11. I suspect the the comments in this issue about add-opens meant to say 
jdk.internal.ref instead (although you just don't want to go there as directly 
using anything in that package may break at any time).


was (Author: bateman):
Sean - In the proposed release note text you say that access to the internal 
JDK classes "is no longer possible in Java 9 and later, because of the new 
module encapsulation system". I don't think this is quite right as the main 
issue you ran into is that the JDK's internal cleaner mechanism was refactored 
and moved from sun.misc to jdk.internal.ref. Also the comment about using 
`--add-opens` may need update too as java.lang remains open to code on the 
class path in JDK 9/10/11. I suspect the the comments in this issue about 
`--add-opens` meant to say jdk.internal.ref instead (although you just don't 
want to go there as directly using anything in that package may break at any 
time).

> Accessing sun.misc.Cleaner in JDK11
> ---
>
> Key: SPARK-24421
> URL: https://issues.apache.org/jira/browse/SPARK-24421
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
>  Labels: release-notes
>
> Many internal APIs such as unsafe are encapsulated in JDK9+, see 
> http://openjdk.java.net/jeps/260 for detail.
> To use Unsafe, we need to add *jdk.unsupported* to our code’s module 
> declaration:
> {code:java}
> module java9unsafe {
> requires jdk.unsupported;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Summary: SparkR vignette fails on CRAN on Java 11  (was: SparkR vignette 
fails on Java 11)

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26010) SparkR vignette fails on Java 11

2018-11-11 Thread Felix Cheung (JIRA)

Felix Cheung created SPARK-26010:


 Summary: SparkR vignette fails on Java 11
 Key: SPARK-26010
 URL: https://issues.apache.org/jira/browse/SPARK-26010
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.4.0, 3.0.0
Reporter: Felix Cheung


follow up to SPARK-25572

but for vignettes

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26010:


Assignee: (was: Apache Spark)

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26010:


Assignee: Apache Spark

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682958#comment-16682958
 ] 

Apache Spark commented on SPARK-26010:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/23007

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16682959#comment-16682959
 ] 

Apache Spark commented on SPARK-26010:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/23007

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683033#comment-16683033
 ] 

Apache Spark commented on SPARK-22674:
--

User 'superbobry' has created a pull request for this issue:
https://github.com/apache/spark/pull/23008

> PySpark breaks serialization of namedtuple subclasses
> -
>
> Key: SPARK-22674
> URL: https://issues.apache.org/jira/browse/SPARK-22674
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Jonas Amrich
>Priority: Major
>
> Pyspark monkey patches the namedtuple class to make it serializable, however 
> this breaks serialization of its subclasses. With current implementation, any 
> subclass will be serialized (and deserialized) as it's parent namedtuple. 
> Consider this code, which will fail with {{AttributeError: 'Point' object has 
> no attribute 'sum'}}:
> {code}
> from collections import namedtuple
> Point = namedtuple("Point", "x y")
> class PointSubclass(Point):
> def sum(self):
> return self.x + self.y
> rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]])
> rdd.collect()[0][0].sum()
> {code}
> Moreover, as PySpark hijacks all namedtuples in the main module, importing 
> pyspark breaks serialization of namedtuple subclasses even in code which is 
> not related to spark / distributed execution. I don't see any clean solution 
> to this; a possible workaround may be to limit serialization hack only to 
> direct namedtuple subclasses like in 
> https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26011) pyspark app with "spark.jars.packages" config does not work

2018-11-11 Thread shanyu zhao (JIRA)

shanyu zhao created SPARK-26011:
---

 Summary: pyspark app with "spark.jars.packages" config does not 
work
 Key: SPARK-26011
 URL: https://issues.apache.org/jira/browse/SPARK-26011
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 2.4.0, 2.3.2
Reporter: shanyu zhao


Command "pyspark --packages" works as expected, but if submitting a livy 
pyspark job with "spark.jars.packages" config, the downloaded packages are not 
added to python's sys.path therefore the package is not available to use.

For example, this command works:

pyspark --packages Azure:mmlspark:0.14

However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
session failed:

%%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
import mmlspark

The root cause is that SparkSubmit determines pyspark app by the suffix of 
primary resource but Livy uses "spark-internal" as the primary resource when 
calling spark-submit, therefore args.isPython is fails in SparkSubmit.scala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26011) pyspark app with "spark.jars.packages" config does not work

2018-11-11 Thread shanyu zhao (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shanyu zhao updated SPARK-26011:

Description: 
Command "pyspark --packages" works as expected, but if submitting a livy 
pyspark job with "spark.jars.packages" config, the downloaded packages are not 
added to python's sys.path therefore the package is not available to use.

For example, this command works:

pyspark --packages Azure:mmlspark:0.14

However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
session failed:

%%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
 import mmlspark

The root cause is that SparkSubmit determines pyspark app by the suffix of 
primary resource but Livy uses "spark-internal" as the primary resource when 
calling spark-submit, therefore args.isPython is set to false in 
SparkSubmit.scala.

  was:
Command "pyspark --packages" works as expected, but if submitting a livy 
pyspark job with "spark.jars.packages" config, the downloaded packages are not 
added to python's sys.path therefore the package is not available to use.

For example, this command works:

pyspark --packages Azure:mmlspark:0.14

However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
session failed:

%%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
import mmlspark

The root cause is that SparkSubmit determines pyspark app by the suffix of 
primary resource but Livy uses "spark-internal" as the primary resource when 
calling spark-submit, therefore args.isPython is fails in SparkSubmit.scala.


> pyspark app with "spark.jars.packages" config does not work
> ---
>
> Key: SPARK-26011
> URL: https://issues.apache.org/jira/browse/SPARK-26011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: shanyu zhao
>Priority: Major
>
> Command "pyspark --packages" works as expected, but if submitting a livy 
> pyspark job with "spark.jars.packages" config, the downloaded packages are 
> not added to python's sys.path therefore the package is not available to use.
> For example, this command works:
> pyspark --packages Azure:mmlspark:0.14
> However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
> session failed:
> %%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
>  import mmlspark
> The root cause is that SparkSubmit determines pyspark app by the suffix of 
> primary resource but Livy uses "spark-internal" as the primary resource when 
> calling spark-submit, therefore args.isPython is set to false in 
> SparkSubmit.scala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26011) pyspark app with "spark.jars.packages" config does not work

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683154#comment-16683154
 ] 

Apache Spark commented on SPARK-26011:
--

User 'shanyu' has created a pull request for this issue:
https://github.com/apache/spark/pull/23009

> pyspark app with "spark.jars.packages" config does not work
> ---
>
> Key: SPARK-26011
> URL: https://issues.apache.org/jira/browse/SPARK-26011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: shanyu zhao
>Priority: Major
>
> Command "pyspark --packages" works as expected, but if submitting a livy 
> pyspark job with "spark.jars.packages" config, the downloaded packages are 
> not added to python's sys.path therefore the package is not available to use.
> For example, this command works:
> pyspark --packages Azure:mmlspark:0.14
> However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
> session failed:
> %%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
>  import mmlspark
> The root cause is that SparkSubmit determines pyspark app by the suffix of 
> primary resource but Livy uses "spark-internal" as the primary resource when 
> calling spark-submit, therefore args.isPython is set to false in 
> SparkSubmit.scala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26011) pyspark app with "spark.jars.packages" config does not work

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26011:


Assignee: Apache Spark

> pyspark app with "spark.jars.packages" config does not work
> ---
>
> Key: SPARK-26011
> URL: https://issues.apache.org/jira/browse/SPARK-26011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: shanyu zhao
>Assignee: Apache Spark
>Priority: Major
>
> Command "pyspark --packages" works as expected, but if submitting a livy 
> pyspark job with "spark.jars.packages" config, the downloaded packages are 
> not added to python's sys.path therefore the package is not available to use.
> For example, this command works:
> pyspark --packages Azure:mmlspark:0.14
> However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
> session failed:
> %%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
>  import mmlspark
> The root cause is that SparkSubmit determines pyspark app by the suffix of 
> primary resource but Livy uses "spark-internal" as the primary resource when 
> calling spark-submit, therefore args.isPython is set to false in 
> SparkSubmit.scala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26011) pyspark app with "spark.jars.packages" config does not work

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26011:


Assignee: (was: Apache Spark)

> pyspark app with "spark.jars.packages" config does not work
> ---
>
> Key: SPARK-26011
> URL: https://issues.apache.org/jira/browse/SPARK-26011
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.2, 2.4.0
>Reporter: shanyu zhao
>Priority: Major
>
> Command "pyspark --packages" works as expected, but if submitting a livy 
> pyspark job with "spark.jars.packages" config, the downloaded packages are 
> not added to python's sys.path therefore the package is not available to use.
> For example, this command works:
> pyspark --packages Azure:mmlspark:0.14
> However, using Jupyter notebook with sparkmagic kernel to open a pyspark 
> session failed:
> %%configure -f \{"conf": {spark.jars.packages": "Azure:mmlspark:0.14"}}
>  import mmlspark
> The root cause is that SparkSubmit determines pyspark app by the suffix of 
> primary resource but Livy uses "spark-internal" as the primary resource when 
> calling spark-submit, therefore args.isPython is set to false in 
> SparkSubmit.scala.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26012) Dynamic partition will fail when both '' and null values are taken as dynamic partition values simultaneously.

2018-11-11 Thread eaton (JIRA)

eaton created SPARK-26012:
-

 Summary: Dynamic partition will fail when both '' and null values 
are taken as dynamic partition values simultaneously.
 Key: SPARK-26012
 URL: https://issues.apache.org/jira/browse/SPARK-26012
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: eaton


Dynamic partition will fail when both '' and null values are taken as dynamic 
partition values simultaneously.
For example, the test bellow will fail before this PR:

test("Null and '' values should not cause dynamic partition failure of string 
types") {
 withTable("t1", "t2") {
 spark.range(3).write.saveAsTable("t1")
 spark.sql("select id, cast(case when id = 1 then '' else null end as string) 
as p" +
 " from t1").write.partitionBy("p").saveAsTable("t2")
 checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), 
Row(2, null)))
 }
 }

The error is: 'org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists'.

 

Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists: 
file:/F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet
 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:289)
 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
 at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:398)
 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
 at 
org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
 at 
org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:248)
 at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:390)
 at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
 at 
org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.newOutputWriter(FileFormatDataWriter.scala:236)
 at 
org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:260)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
 ... 10 more

20:43:55.460 WARN 
org.apache.spark.sql.execution.datasources.FileFormatWriterSuite:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26012) Dynamic partition will fail when both '' and null values are taken as dynamic partition values simultaneously.

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26012:


Assignee: Apache Spark

> Dynamic partition will fail when both '' and null values are taken as dynamic 
> partition values simultaneously.
> --
>
> Key: SPARK-26012
> URL: https://issues.apache.org/jira/browse/SPARK-26012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: eaton
>Assignee: Apache Spark
>Priority: Major
>
> Dynamic partition will fail when both '' and null values are taken as dynamic 
> partition values simultaneously.
> For example, the test bellow will fail before this PR:
> test("Null and '' values should not cause dynamic partition failure of string 
> types") {
>  withTable("t1", "t2") {
>  spark.range(3).write.saveAsTable("t1")
>  spark.sql("select id, cast(case when id = 1 then '' else null end as string) 
> as p" +
>  " from t1").write.partitionBy("p").saveAsTable("t2")
>  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), 
> Row(2, null)))
>  }
>  }
> The error is: 'org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists'.
>  
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists: 
> file:/F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:289)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:398)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
>  at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:248)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:390)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.newOutputWriter(FileFormatDataWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:260)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
>  ... 10 more
> 20:43:55.460 WARN 
> org.apache.spark.sql.execution.datasources.FileFormatWriterSuite:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26012) Dynamic partition will fail when both '' and null values are taken as dynamic partition values simultaneously.

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683217#comment-16683217
 ] 

Apache Spark commented on SPARK-26012:
--

User 'eatoncys' has created a pull request for this issue:
https://github.com/apache/spark/pull/23010

> Dynamic partition will fail when both '' and null values are taken as dynamic 
> partition values simultaneously.
> --
>
> Key: SPARK-26012
> URL: https://issues.apache.org/jira/browse/SPARK-26012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: eaton
>Priority: Major
>
> Dynamic partition will fail when both '' and null values are taken as dynamic 
> partition values simultaneously.
> For example, the test bellow will fail before this PR:
> test("Null and '' values should not cause dynamic partition failure of string 
> types") {
>  withTable("t1", "t2") {
>  spark.range(3).write.saveAsTable("t1")
>  spark.sql("select id, cast(case when id = 1 then '' else null end as string) 
> as p" +
>  " from t1").write.partitionBy("p").saveAsTable("t2")
>  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), 
> Row(2, null)))
>  }
>  }
> The error is: 'org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists'.
>  
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists: 
> file:/F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:289)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:398)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
>  at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:248)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:390)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.newOutputWriter(FileFormatDataWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:260)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
>  ... 10 more
> 20:43:55.460 WARN 
> org.apache.spark.sql.execution.datasources.FileFormatWriterSuite:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26012) Dynamic partition will fail when both '' and null values are taken as dynamic partition values simultaneously.

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26012:


Assignee: (was: Apache Spark)

> Dynamic partition will fail when both '' and null values are taken as dynamic 
> partition values simultaneously.
> --
>
> Key: SPARK-26012
> URL: https://issues.apache.org/jira/browse/SPARK-26012
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: eaton
>Priority: Major
>
> Dynamic partition will fail when both '' and null values are taken as dynamic 
> partition values simultaneously.
> For example, the test bellow will fail before this PR:
> test("Null and '' values should not cause dynamic partition failure of string 
> types") {
>  withTable("t1", "t2") {
>  spark.range(3).write.saveAsTable("t1")
>  spark.sql("select id, cast(case when id = 1 then '' else null end as string) 
> as p" +
>  " from t1").write.partitionBy("p").saveAsTable("t2")
>  checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), 
> Row(2, null)))
>  }
>  }
> The error is: 'org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists'.
>  
> Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
> exists: 
> file:/F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:289)
>  at 
> org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:398)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
>  at 
> org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
>  at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
>  at 
> org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
>  at 
> org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:248)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:390)
>  at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
>  at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.newOutputWriter(FileFormatDataWriter.scala:236)
>  at 
> org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:260)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
>  at 
> org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
>  ... 10 more
> 20:43:55.460 WARN 
> org.apache.spark.sql.execution.datasources.FileFormatWriterSuite:



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26013) Upgrade R tools version to 3.5.1 in AppVeyor build

2018-11-11 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created SPARK-26013:


 Summary: Upgrade R tools version to 3.5.1 in AppVeyor build
 Key: SPARK-26013
 URL: https://issues.apache.org/jira/browse/SPARK-26013
 Project: Spark
  Issue Type: Improvement
  Components: Build, SparkR
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon


R tools 3.5.1 is released few months ago. Spark currently uses 
https://github.com/apache/spark/blob/master/dev/appveyor-install-dependencies.ps1#L119



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26012) Dynamic partition will fail when both '' and null values are taken as dynamic partition values simultaneously.

2018-11-11 Thread eaton (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

eaton updated SPARK-26012:
--
Description: 
Dynamic partition will fail when both '' and null values are taken as dynamic 
partition values simultaneously.
 For example, the test bellow will fail.

test("Null and '' values should not cause dynamic partition failure of string 
types") {
 withTable("t1", "t2")

{ spark.range(3).write.saveAsTable("t1") spark.sql("select id, cast(case when 
id = 1 then '' else null end as string) as p" + " from 
t1").write.partitionBy("p").saveAsTable("t2") 
checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), 
Row(2, null))) }

}

The error is: 'org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists'.

 

Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists: 
[file:/F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet|file:///F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet]
 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:289)
 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
 at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:398)
 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
 at 
org.apache.parquet.hadoop.util.HadoopOutputFile.create(HadoopOutputFile.java:74)
 at 
org.apache.parquet.hadoop.ParquetFileWriter.(ParquetFileWriter.java:248)
 at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:390)
 at 
org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:349)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.(ParquetOutputWriter.scala:37)
 at 
org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anon$1.newInstance(ParquetFileFormat.scala:151)
 at 
org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.newOutputWriter(FileFormatDataWriter.scala:236)
 at 
org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:260)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:239)
 at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1394)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:245)
 ... 10 more

20:43:55.460 WARN 
org.apache.spark.sql.execution.datasources.FileFormatWriterSuite:

  was:
Dynamic partition will fail when both '' and null values are taken as dynamic 
partition values simultaneously.
For example, the test bellow will fail before this PR:

test("Null and '' values should not cause dynamic partition failure of string 
types") {
 withTable("t1", "t2") {
 spark.range(3).write.saveAsTable("t1")
 spark.sql("select id, cast(case when id = 1 then '' else null end as string) 
as p" +
 " from t1").write.partitionBy("p").saveAsTable("t2")
 checkAnswer(spark.table("t2").sort("id"), Seq(Row(0, null), Row(1, null), 
Row(2, null)))
 }
 }

The error is: 'org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists'.

 

Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: File already 
exists: 
file:/F:/learning/spark/spark_master/spark_compile/spark-warehouse/t2/_temporary/0/_temporary/attempt_2018204354_0001_m_00_0/p=__HIVE_DEFAULT_PARTITION__/part-0-96217c96-3695-4f18-b0db-4f35a9078a3d.c000.snappy.parquet
 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:289)
 at org.apache.hadoop.fs.RawLocalFileSystem.create(RawLocalFileSystem.java:328)
 at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.(ChecksumFileSystem.java:398)
 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:461)
 at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)
 at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)
 at

[jira] [Commented] (SPARK-26013) Upgrade R tools version to 3.5.1 in AppVeyor build

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683221#comment-16683221
 ] 

Apache Spark commented on SPARK-26013:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23011

> Upgrade R tools version to 3.5.1 in AppVeyor build
> --
>
> Key: SPARK-26013
> URL: https://issues.apache.org/jira/browse/SPARK-26013
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> R tools 3.5.1 is released few months ago. Spark currently uses 
> https://github.com/apache/spark/blob/master/dev/appveyor-install-dependencies.ps1#L119



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26013) Upgrade R tools version to 3.5.1 in AppVeyor build

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26013:


Assignee: (was: Apache Spark)

> Upgrade R tools version to 3.5.1 in AppVeyor build
> --
>
> Key: SPARK-26013
> URL: https://issues.apache.org/jira/browse/SPARK-26013
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> R tools 3.5.1 is released few months ago. Spark currently uses 
> https://github.com/apache/spark/blob/master/dev/appveyor-install-dependencies.ps1#L119



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26014) Deprecate R < 3.4 support

2018-11-11 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created SPARK-26014:


 Summary: Deprecate R < 3.4 support
 Key: SPARK-26014
 URL: https://issues.apache.org/jira/browse/SPARK-26014
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon


See 
http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-SparkR-CRAN-feasibility-check-server-problem-td25605.html

R version. 3.1.x is too old. It's released 4.5 years ago.
R 3.4.0 is released 1.5 years ago. Considering the timing for Spark 3.0, 
deprecating lower versions, bumping up R to 3.4 might be reasonable option.

It should be good to deprecate and drop < R 3.4 support.

If we think about the practice, nothing particular is required within R codes 
as far as I can tell.
 We will just upgrade Jenkins's R version to 3.4, which mean we're not going to 
test 3.1 R version (but instead we will test 3.4).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26013) Upgrade R tools version to 3.5.1 in AppVeyor build

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26013:


Assignee: Apache Spark

> Upgrade R tools version to 3.5.1 in AppVeyor build
> --
>
> Key: SPARK-26013
> URL: https://issues.apache.org/jira/browse/SPARK-26013
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> R tools 3.5.1 is released few months ago. Spark currently uses 
> https://github.com/apache/spark/blob/master/dev/appveyor-install-dependencies.ps1#L119



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26013) Upgrade R tools version to 3.5.1 in AppVeyor build

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683223#comment-16683223
 ] 

Apache Spark commented on SPARK-26013:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23011

> Upgrade R tools version to 3.5.1 in AppVeyor build
> --
>
> Key: SPARK-26013
> URL: https://issues.apache.org/jira/browse/SPARK-26013
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> R tools 3.5.1 is released few months ago. Spark currently uses 
> https://github.com/apache/spark/blob/master/dev/appveyor-install-dependencies.ps1#L119



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26014) Deprecate R < 3.4 support

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26014:


Assignee: (was: Apache Spark)

> Deprecate R < 3.4 support
> -
>
> Key: SPARK-26014
> URL: https://issues.apache.org/jira/browse/SPARK-26014
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See 
> http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-SparkR-CRAN-feasibility-check-server-problem-td25605.html
> R version. 3.1.x is too old. It's released 4.5 years ago.
> R 3.4.0 is released 1.5 years ago. Considering the timing for Spark 3.0, 
> deprecating lower versions, bumping up R to 3.4 might be reasonable option.
> It should be good to deprecate and drop < R 3.4 support.
> If we think about the practice, nothing particular is required within R codes 
> as far as I can tell.
>  We will just upgrade Jenkins's R version to 3.4, which mean we're not going 
> to test 3.1 R version (but instead we will test 3.4).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26014) Deprecate R < 3.4 support

2018-11-11 Thread Apache Spark (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26014:


Assignee: Apache Spark

> Deprecate R < 3.4 support
> -
>
> Key: SPARK-26014
> URL: https://issues.apache.org/jira/browse/SPARK-26014
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> See 
> http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-SparkR-CRAN-feasibility-check-server-problem-td25605.html
> R version. 3.1.x is too old. It's released 4.5 years ago.
> R 3.4.0 is released 1.5 years ago. Considering the timing for Spark 3.0, 
> deprecating lower versions, bumping up R to 3.4 might be reasonable option.
> It should be good to deprecate and drop < R 3.4 support.
> If we think about the practice, nothing particular is required within R codes 
> as far as I can tell.
>  We will just upgrade Jenkins's R version to 3.4, which mean we're not going 
> to test 3.1 R version (but instead we will test 3.4).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26014) Deprecate R < 3.4 support

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683243#comment-16683243
 ] 

Apache Spark commented on SPARK-26014:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/23012

> Deprecate R < 3.4 support
> -
>
> Key: SPARK-26014
> URL: https://issues.apache.org/jira/browse/SPARK-26014
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See 
> http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-SparkR-CRAN-feasibility-check-server-problem-td25605.html
> R version. 3.1.x is too old. It's released 4.5 years ago.
> R 3.4.0 is released 1.5 years ago. Considering the timing for Spark 3.0, 
> deprecating lower versions, bumping up R to 3.4 might be reasonable option.
> It should be good to deprecate and drop < R 3.4 support.
> If we think about the practice, nothing particular is required within R codes 
> as far as I can tell.
>  We will just upgrade Jenkins's R version to 3.4, which mean we're not going 
> to test 3.1 R version (but instead we will test 3.4).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19784) refresh datasource table after alter the location

2018-11-11 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-19784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683246#comment-16683246
 ] 

Apache Spark commented on SPARK-19784:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/22721

> refresh datasource table after alter the location
> -
>
> Key: SPARK-19784
> URL: https://issues.apache.org/jira/browse/SPARK-19784
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Song Jun
>Priority: Major
>
> currently if we alter the location of a datasource table, then we select from 
> it, it still return the data of  the old location.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19761) create InMemoryFileIndex with empty rootPaths when set PARALLEL_PARTITION_DISCOVERY_THRESHOLD to zero

2018-11-11 Thread Yuming Wang (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-19761.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> create InMemoryFileIndex with empty rootPaths when set 
> PARALLEL_PARTITION_DISCOVERY_THRESHOLD to zero
> -
>
> Key: SPARK-19761
> URL: https://issues.apache.org/jira/browse/SPARK-19761
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Song Jun
>Priority: Major
> Fix For: 2.2.0
>
>
> if we create a InMemoryFileIndex with an empty rootPaths when set 
> PARALLEL_PARTITION_DISCOVERY_THRESHOLD to zero, it will throw an  exception:
> {code}
> Positive number of slices required
> java.lang.IllegalArgumentException: Positive number of slices required
> at 
> org.apache.spark.rdd.ParallelCollectionRDD$.slice(ParallelCollectionRDD.scala:119)
> at 
> org.apache.spark.rdd.ParallelCollectionRDD.getPartitions(ParallelCollectionRDD.scala:97)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
> at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
> at scala.Option.getOrElse(Option.scala:121)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
> at org.apache.spark.SparkContext.runJob(SparkContext.scala:2084)
> at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
> at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
> at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex$.org$apache$spark$sql$execution$datasources$PartitioningAwareFileIndex$$bulkListLeafFiles(PartitioningAwareFileIndex.scala:357)
> at 
> org.apache.spark.sql.execution.datasources.PartitioningAwareFileIndex.listLeafFiles(PartitioningAwareFileIndex.scala:256)
> at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex.refresh0(InMemoryFileIndex.scala:74)
> at 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex.(InMemoryFileIndex.scala:50)
> at 
> org.apache.spark.sql.execution.datasources.FileIndexSuite$$anonfun$9$$anonfun$apply$mcV$sp$2.apply$mcV$sp(FileIndexSuite.scala:186)
> at 
> org.apache.spark.sql.test.SQLTestUtils$class.withSQLConf(SQLTestUtils.scala:105)
> at 
> org.apache.spark.sql.execution.datasources.FileIndexSuite.withSQLConf(FileIndexSuite.scala:33)
> at 
> org.apache.spark.sql.execution.datasources.FileIndexSuite$$anonfun$9.apply$mcV$sp(FileIndexSuite.scala:185)
> at 
> org.apache.spark.sql.execution.datasources.FileIndexSuite$$anonfun$9.apply(FileIndexSuite.scala:185)
> at 
> org.apache.spark.sql.execution.datasources.FileIndexSuite$$anonfun$9.apply(FileIndexSuite.scala:185)
> at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
> at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26005) Upgrade ANTRL to 4.7.1

2018-11-11 Thread Xiao Li (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-26005.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

> Upgrade ANTRL to 4.7.1
> --
>
> Key: SPARK-26005
> URL: https://issues.apache.org/jira/browse/SPARK-26005
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

52 matches

Mail list logo