[jira] [Assigned] (SPARK-21513) SQL to_json should support all column types

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21513:


Assignee: (was: Apache Spark)

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>  Labels: Starter
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21513) SQL to_json should support all column types

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143450#comment-16143450
 ] 

Apache Spark commented on SPARK-21513:
--

User 'goldmedal' has created a pull request for this issue:
https://github.com/apache/spark/pull/18875

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>  Labels: Starter
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21513) SQL to_json should support all column types

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21513:


Assignee: Apache Spark

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>Assignee: Apache Spark
>  Labels: Starter
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21818) MultivariateOnlineSummarizer.variance generate negative result

2017-08-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21818:
--
Fix Version/s: 2.2.1

> MultivariateOnlineSummarizer.variance generate negative result
> --
>
> Key: SPARK-21818
> URL: https://issues.apache.org/jira/browse/SPARK-21818
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.2.1, 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Because of numerical error, MultivariateOnlineSummarizer.variance is possible 
> to generate negative variance.
> This is a serious bug because many algos in MLLib use stddev computed from 
> sqrt(variance),
> it will generate NaN and crash the whole algorithm.
> we can reproduce this bug use the following code:
> {code}
> val summarizer1 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.7)
> val summarizer2 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer3 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.5)
> val summarizer4 = (new MultivariateOnlineSummarizer)
>   .add(Vectors.dense(3.0), 0.4)
> val summarizer = summarizer1
>   .merge(summarizer2)
>   .merge(summarizer3)
>   .merge(summarizer4)
> println(summarizer.variance(0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21513) SQL to_json should support all column types

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21513:


Assignee: (was: Apache Spark)

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>  Labels: Starter
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21513) SQL to_json should support all column types

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21513:


Assignee: Apache Spark

> SQL to_json should support all column types
> ---
>
> Key: SPARK-21513
> URL: https://issues.apache.org/jira/browse/SPARK-21513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Aaron Davidson
>Assignee: Apache Spark
>  Labels: Starter
>
> The built-in SQL UDF "to_json" currently supports serializing StructType 
> columns, as well as Arrays of StructType columns. If you attempt to use it on 
> a different type, for example a map, you get an error like this:
> {code}
> AnalysisException: cannot resolve 'structstojson(`tags`)' due to data type 
> mismatch: Input type map must be a struct or array of 
> structs.;;
> {code}
> This limitation seems arbitrary; if I were to go through the effort of 
> enclosing my map in a struct, it would be serializable. Same thing with any 
> other non-struct type.
> Therefore the desired improvement is to allow to_json to operate directly on 
> any column type. The associated code is 
> [here|https://github.com/apache/spark/blob/86174ea89b39a300caaba6baffac70f3dc702788/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L653].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread DjvuLee (JIRA)
DjvuLee created SPARK-21849:
---

 Summary: Make the serializer function more robust
 Key: SPARK-21849
 URL: https://issues.apache.org/jira/browse/SPARK-21849
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: DjvuLee
Priority: Minor


make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread DjvuLee (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DjvuLee updated SPARK-21849:

Issue Type: Improvement  (was: Bug)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143462#comment-16143462
 ] 

Apache Spark commented on SPARK-21849:
--

User 'djvulee' has created a pull request for this issue:
https://github.com/apache/spark/pull/19067

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: Apache Spark

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Assignee: Apache Spark
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: (was: Apache Spark)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: (was: Apache Spark)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21849:
--
Priority: Trivial  (was: Minor)

Too trivial for a JIRA

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: Apache Spark

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Assignee: Apache Spark
>Priority: Minor
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21568) ConsoleProgressBar should only be enabled in shells

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21568:


Assignee: (was: Apache Spark)

> ConsoleProgressBar should only be enabled in shells
> ---
>
> Key: SPARK-21568
> URL: https://issues.apache.org/jira/browse/SPARK-21568
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Priority: Minor
>
> This is the current logic that enables the progress bar:
> {code}
> _progressBar =
>   if (_conf.getBoolean("spark.ui.showConsoleProgress", true) && 
> !log.isInfoEnabled) {
> Some(new ConsoleProgressBar(this))
>   } else {
> None
>   }
> {code}
> That is based on the logging level; it just happens to align with the default 
> configuration for shells (WARN) and normal apps (INFO).
> But if someone changes the default logging config for their app, this may 
> break; they may silence logs by setting the default level to WARN or ERROR, 
> and a normal application will see a lot of log spam from the progress bar 
> (which is especially bad when output is redirected to a file, as is usually 
> done when running in cluster mode).
> While it's possible to disable the progress bar separately, this behavior is 
> not really expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21568) ConsoleProgressBar should only be enabled in shells

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21568:


Assignee: Apache Spark

> ConsoleProgressBar should only be enabled in shells
> ---
>
> Key: SPARK-21568
> URL: https://issues.apache.org/jira/browse/SPARK-21568
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Apache Spark
>Priority: Minor
>
> This is the current logic that enables the progress bar:
> {code}
> _progressBar =
>   if (_conf.getBoolean("spark.ui.showConsoleProgress", true) && 
> !log.isInfoEnabled) {
> Some(new ConsoleProgressBar(this))
>   } else {
> None
>   }
> {code}
> That is based on the logging level; it just happens to align with the default 
> configuration for shells (WARN) and normal apps (INFO).
> But if someone changes the default logging config for their app, this may 
> break; they may silence logs by setting the default level to WARN or ERROR, 
> and a normal application will see a lot of log spam from the progress bar 
> (which is especially bad when output is redirected to a file, as is usually 
> done when running in cluster mode).
> While it's possible to disable the progress bar separately, this behavior is 
> not really expected.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: Apache Spark

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Assignee: Apache Spark
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: (was: Apache Spark)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: (was: Apache Spark)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: Apache Spark

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Assignee: Apache Spark
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: (was: Apache Spark)

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: Apache Spark

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21848) Create trait to identify user-defined functions

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21848:


Assignee: (was: Apache Spark)

> Create trait to identify user-defined functions
> ---
>
> Key: SPARK-21848
> URL: https://issues.apache.org/jira/browse/SPARK-21848
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Gengliang Wang
>Priority: Minor
>
> Create a trait to make it easier for identifying what expressions are 
> user-defined functions



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21190) SPIP: Vectorized UDFs in Python

2017-08-28 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143491#comment-16143491
 ] 

Wenchen Fan commented on SPARK-21190:
-

hmmm, your proposal has a weird usage: users need to pass an argument to a 
0-parameter UDF, and transferring this constant column batch is also a waste.

Defining the 0-parameter udf with a size parameter is unintuitive, but I can't 
think of a better idea...

> SPIP: Vectorized UDFs in Python
> ---
>
> Key: SPARK-21190
> URL: https://issues.apache.org/jira/browse/SPARK-21190
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>  Labels: SPIP
> Attachments: SPIPVectorizedUDFsforPython (1).pdf
>
>
> *Background and Motivation*
> Python is one of the most popular programming languages among Spark users. 
> Spark currently exposes a row-at-a-time interface for defining and executing 
> user-defined functions (UDFs). This introduces high overhead in serialization 
> and deserialization, and also makes it difficult to leverage Python libraries 
> (e.g. numpy, Pandas) that are written in native code.
>  
> This proposal advocates introducing new APIs to support vectorized UDFs in 
> Python, in which a block of data is transferred over to Python in some 
> columnar format for execution.
>  
>  
> *Target Personas*
> Data scientists, data engineers, library developers.
>  
> *Goals*
> - Support vectorized UDFs that apply on chunks of the data frame
> - Low system overhead: Substantially reduce serialization and deserialization 
> overhead when compared with row-at-a-time interface
> - UDF performance: Enable users to leverage native libraries in Python (e.g. 
> numpy, Pandas) for data manipulation in these UDFs
>  
> *Non-Goals*
> The following are explicitly out of scope for the current SPIP, and should be 
> done in future SPIPs. Nonetheless, it would be good to consider these future 
> use cases during API design, so we can achieve some consistency when rolling 
> out new APIs.
>  
> - Define block oriented UDFs in other languages (that are not Python).
> - Define aggregate UDFs
> - Tight integration with machine learning frameworks
>  
> *Proposed API Changes*
> The following sketches some possibilities. I haven’t spent a lot of time 
> thinking about the API (wrote it down in 5 mins) and I am not attached to 
> this design at all. The main purpose of the SPIP is to get feedback on use 
> cases and see how they can impact API design.
>  
> A few things to consider are:
>  
> 1. Python is dynamically typed, whereas DataFrames/SQL requires static, 
> analysis time typing. This means users would need to specify the return type 
> of their UDFs.
>  
> 2. Ratio of input rows to output rows. We propose initially we require number 
> of output rows to be the same as the number of input rows. In the future, we 
> can consider relaxing this constraint with support for vectorized aggregate 
> UDFs.
> 3. How do we handle null values, since Pandas doesn't have the concept of 
> nulls?
>  
> Proposed API sketch (using examples):
>  
> Use case 1. A function that defines all the columns of a DataFrame (similar 
> to a “map” function):
>  
> {code}
> @spark_udf(some way to describe the return schema)
> def my_func_on_entire_df(input):
>   """ Some user-defined function.
>  
>   :param input: A Pandas DataFrame with two columns, a and b.
>   :return: :class: A Pandas data frame.
>   """
>   input[c] = input[a] + input[b]
>   Input[d] = input[a] - input[b]
>   return input
>  
> spark.range(1000).selectExpr("id a", "id / 2 b")
>   .mapBatches(my_func_on_entire_df)
> {code}
>  
> Use case 2. A function that defines only one column (similar to existing 
> UDFs):
>  
> {code}
> @spark_udf(some way to describe the return schema)
> def my_func_that_returns_one_column(input):
>   """ Some user-defined function.
>  
>   :param input: A Pandas DataFrame with two columns, a and b.
>   :return: :class: A numpy array
>   """
>   return input[a] + input[b]
>  
> my_func = udf(my_func_that_returns_one_column)
>  
> df = spark.range(1000).selectExpr("id a", "id / 2 b")
> df.withColumn("c", my_func(df.a, df.b))
> {code}
>  
>  
>  
> *Optional Design Sketch*
> I’m more concerned about getting proper feedback for API design. The 
> implementation should be pretty straightforward and is not a huge concern at 
> this point. We can leverage the same implementation for faster toPandas 
> (using Arrow).
>  
>  
> *Optional Rejected Designs*
> See above.
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional co

[jira] [Assigned] (SPARK-21849) Make the serializer function more robust

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21849:


Assignee: Apache Spark

> Make the serializer function more robust
> 
>
> Key: SPARK-21849
> URL: https://issues.apache.org/jira/browse/SPARK-21849
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: DjvuLee
>Assignee: Apache Spark
>Priority: Trivial
>
> make sure the `close` function is called in the `serialize` function.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21842) Support Kerberos ticket renewal and creation in Mesos

2017-08-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21842:
--
Fix Version/s: (was: 2.3.0)

> Support Kerberos ticket renewal and creation in Mesos 
> --
>
> Key: SPARK-21842
> URL: https://issues.apache.org/jira/browse/SPARK-21842
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 2.3.0
>Reporter: Arthur Rand
>
> We at Mesosphere have written Kerberos support for Spark on Mesos. The code 
> to use Kerberos on a Mesos cluster has been added to Apache Spark 
> (SPARK-16742). This ticket is to complete the implementation and allow for 
> ticket renewal and creation. Specifically for long running and streaming jobs.
> Mesosphere design doc (needs revision, wip): 
> https://docs.google.com/document/d/1xyzICg7SIaugCEcB4w1vBWp24UDkyJ1Pyt2jtnREFqc/edit#heading=h.tdnq7wilqrj6



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21808) Add R interface of binarizer

2017-08-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-21808:
--
Target Version/s:   (was: 2.2.0)
Priority: Minor  (was: Major)

> Add R interface of binarizer
> 
>
> Key: SPARK-21808
> URL: https://issues.apache.org/jira/browse/SPARK-21808
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Jiaming Shu
>Priority: Minor
>  Labels: features
>
> add BinarizerWrapper.scala in org.apache.spark.ml.r
> add mllib_feature.R and test_mllib_feature.R in R
> update DESCRIPTION and NAMESPACE to collate 'mllib_feature.R' and export 
> 'spark.binarizer'



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21691) Accessing canonicalized plan for query with limit throws exception

2017-08-28 Thread Bjoern Toldbod (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143511#comment-16143511
 ] 

Bjoern Toldbod commented on SPARK-21691:


I have worked around the issue by not accessing the canonicalized logical plan, 
instead using the logical plan itself.

Our application works with dynamic (user provided queries). 
We need to know which tables are referenced by a given query and we inspect the 
execution plans in order to determine this.

I don't know of any alternative to inspecting the plans (other than writing my 
own sql-parser).

> Accessing canonicalized plan for query with limit throws exception
> --
>
> Key: SPARK-21691
> URL: https://issues.apache.org/jira/browse/SPARK-21691
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Bjoern Toldbod
>
> Accessing the logical, canonicalized plan fails for queries with limits.
> The following demonstrates the issue:
> {code:java}
> val session = SparkSession.builder.master("local").getOrCreate()
> // This works
> session.sql("select * from (values 0, 
> 1)").queryExecution.logical.canonicalized
> // This fails
> session.sql("select * from (values 0, 1) limit 
> 1").queryExecution.logical.canonicalized
> {code}
> The message in the thrown exception is somewhat confusing (or at least not 
> directly related to the limit):
> "Invalid call to toAttribute on unresolved object, tree: *"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: Apache Spark

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21835) RewritePredicateSubquery should not produce unresolved query plans

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21835:


Assignee: (was: Apache Spark)

> RewritePredicateSubquery should not produce unresolved query plans
> --
>
> Key: SPARK-21835
> URL: https://issues.apache.org/jira/browse/SPARK-21835
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> {{RewritePredicateSubquery}} rewrites correlated subquery to join operations. 
> During the structural integrity, I found {[RewritePredicateSubquery}} can 
> produce unresolved query plans due to conflicting attributes. We should not 
> let {{RewritePredicateSubquery}} produce unresolved plans.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21428) CliSessionState never be recognized because of IsolatedClientLoader

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143557#comment-16143557
 ] 

Apache Spark commented on SPARK-21428:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/19068

> CliSessionState never be recognized because of IsolatedClientLoader
> ---
>
> Key: SPARK-21428
> URL: https://issues.apache.org/jira/browse/SPARK-21428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.3, 2.0.2, 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 2.3.0
>
>
> When using bin/spark-sql with the builtin hive jars, we are expecting to 
> reuse the instance ofCliSessionState.
> {quote}
> // In `SparkSQLCLIDriver`, we have already started a 
> `CliSessionState`,
> // which contains information like configurations from command line. 
> Later
> // we call `SparkSQLEnv.init()` there, which would run into this part 
> again.
> // so we should keep `conf` and reuse the existing instance of 
> `CliSessionState`.
> {quote}
> Actually it never ever happened since SessionState.get()  at 
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L138
>  will always be null by IsolatedClientLoader.
> The SessionState.start was called many times, which will creates 
> `hive.exec.strachdir`, see the following case...
> {code:java}
> spark git:(master) bin/spark-sql --conf spark.sql.hive.metastore.jars=builtin
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 17/07/16 23:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 17/07/16 23:29:04 INFO HiveMetaStore: 0: Opening raw store with implemenation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 17/07/16 23:29:04 INFO ObjectStore: ObjectStore, initialize called
> 17/07/16 23:29:04 INFO Persistence: Property 
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 17/07/16 23:29:04 INFO Persistence: Property datanucleus.cache.level2 unknown 
> - will be ignored
> 17/07/16 23:29:05 INFO ObjectStore: Setting MetaStore object pin classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 17/07/16 23:29:06 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 17/07/16 23:29:06 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 17/07/16 23:29:07 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 17/07/16 23:29:07 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 17/07/16 23:29:07 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
> DERBY
> 17/07/16 23:29:07 INFO ObjectStore: Initialized ObjectStore
> 17/07/16 23:29:07 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 1.2.0
> 17/07/16 23:29:07 WARN ObjectStore: Failed to get database default, returning 
> NoSuchObjectException
> 17/07/16 23:29:08 INFO HiveMetaStore: Added admin role in metastore
> 17/07/16 23:29:08 INFO HiveMetaStore: Added public role in metastore
> 17/07/16 23:29:08 INFO HiveMetaStore: No user is added in admin role, since 
> config is empty
> 17/07/16 23:29:08 INFO HiveMetaStore: 0: get_all_databases
> 17/07/16 23:29:08 INFO audit: ugi=Kentip=unknown-ip-addr  
> cmd=get_all_databases
> 17/07/16 23:29:08 INFO HiveMetaStore: 0: get_functions: db=default pat=*
> 17/07/16 23:29:08 INFO audit: ugi=Kentip=unknown-ip-addr  
> cmd=get_functions: db=default pat=*
> 17/07/16 23:29:08 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 17/07/16 23:29:08 INFO SessionState: Created local directory: 
> /var/folders/k2/04p4k4ws73l6711h_mz2_tq0gn/T/a2c40e42-08e2-4023-8464-3432ed690184_resources
> 17/07/16 23:29:08 INFO SessionState: Created HDFS directory: 
> /tmp/hive/Kent/a2c40e42-08e2-4023-8464-3432ed690184
> 17/07/16 23:29:08 INFO SessionState: Created local directory: 
> /var/folders/k2/04p4k4ws73l6711h_mz2_tq0gn/T/Kent/a2c40e42-08e2-4023-8464-3432ed690184
> 1

[jira] [Commented] (SPARK-21428) CliSessionState never be recognized because of IsolatedClientLoader

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143559#comment-16143559
 ] 

Apache Spark commented on SPARK-21428:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/19068

> CliSessionState never be recognized because of IsolatedClientLoader
> ---
>
> Key: SPARK-21428
> URL: https://issues.apache.org/jira/browse/SPARK-21428
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.3, 2.0.2, 2.1.1, 2.2.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 2.3.0
>
>
> When using bin/spark-sql with the builtin hive jars, we are expecting to 
> reuse the instance ofCliSessionState.
> {quote}
> // In `SparkSQLCLIDriver`, we have already started a 
> `CliSessionState`,
> // which contains information like configurations from command line. 
> Later
> // we call `SparkSQLEnv.init()` there, which would run into this part 
> again.
> // so we should keep `conf` and reuse the existing instance of 
> `CliSessionState`.
> {quote}
> Actually it never ever happened since SessionState.get()  at 
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L138
>  will always be null by IsolatedClientLoader.
> The SessionState.start was called many times, which will creates 
> `hive.exec.strachdir`, see the following case...
> {code:java}
> spark git:(master) bin/spark-sql --conf spark.sql.hive.metastore.jars=builtin
> Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
> 17/07/16 23:29:04 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 17/07/16 23:29:04 INFO HiveMetaStore: 0: Opening raw store with implemenation 
> class:org.apache.hadoop.hive.metastore.ObjectStore
> 17/07/16 23:29:04 INFO ObjectStore: ObjectStore, initialize called
> 17/07/16 23:29:04 INFO Persistence: Property 
> hive.metastore.integral.jdo.pushdown unknown - will be ignored
> 17/07/16 23:29:04 INFO Persistence: Property datanucleus.cache.level2 unknown 
> - will be ignored
> 17/07/16 23:29:05 INFO ObjectStore: Setting MetaStore object pin classes with 
> hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
> 17/07/16 23:29:06 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 17/07/16 23:29:06 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 17/07/16 23:29:07 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 17/07/16 23:29:07 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" 
> so does not have its own datastore table.
> 17/07/16 23:29:07 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is 
> DERBY
> 17/07/16 23:29:07 INFO ObjectStore: Initialized ObjectStore
> 17/07/16 23:29:07 WARN ObjectStore: Version information not found in 
> metastore. hive.metastore.schema.verification is not enabled so recording the 
> schema version 1.2.0
> 17/07/16 23:29:07 WARN ObjectStore: Failed to get database default, returning 
> NoSuchObjectException
> 17/07/16 23:29:08 INFO HiveMetaStore: Added admin role in metastore
> 17/07/16 23:29:08 INFO HiveMetaStore: Added public role in metastore
> 17/07/16 23:29:08 INFO HiveMetaStore: No user is added in admin role, since 
> config is empty
> 17/07/16 23:29:08 INFO HiveMetaStore: 0: get_all_databases
> 17/07/16 23:29:08 INFO audit: ugi=Kentip=unknown-ip-addr  
> cmd=get_all_databases
> 17/07/16 23:29:08 INFO HiveMetaStore: 0: get_functions: db=default pat=*
> 17/07/16 23:29:08 INFO audit: ugi=Kentip=unknown-ip-addr  
> cmd=get_functions: db=default pat=*
> 17/07/16 23:29:08 INFO Datastore: The class 
> "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as 
> "embedded-only" so does not have its own datastore table.
> 17/07/16 23:29:08 INFO SessionState: Created local directory: 
> /var/folders/k2/04p4k4ws73l6711h_mz2_tq0gn/T/a2c40e42-08e2-4023-8464-3432ed690184_resources
> 17/07/16 23:29:08 INFO SessionState: Created HDFS directory: 
> /tmp/hive/Kent/a2c40e42-08e2-4023-8464-3432ed690184
> 17/07/16 23:29:08 INFO SessionState: Created local directory: 
> /var/folders/k2/04p4k4ws73l6711h_mz2_tq0gn/T/Kent/a2c40e42-08e2-4023-8464-3432ed690184
> 1

[jira] [Commented] (SPARK-19388) Reading an empty folder as parquet causes an Analysis Exception

2017-08-28 Thread Shivam Dalmia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143607#comment-16143607
 ] 

Shivam Dalmia commented on SPARK-19388:
---

I am also facing this issue; any reason why it was closed so quickly?

> Reading an empty folder as parquet causes an Analysis Exception
> ---
>
> Key: SPARK-19388
> URL: https://issues.apache.org/jira/browse/SPARK-19388
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Franklyn Dsouza
>Priority: Minor
>
> Reading an empty folder as parquet used to return an empty dataframe up till 
> 2.0 .
> Now this causes an analysis exception like so 
> {code}
> In [1]: df = sc.sql.read.parquet("empty_dir/")
> ---
> AnalysisException Traceback (most recent call last)
> > 1 df = sqlCtx.read.parquet("empty_dir/")
> spark/99f3dfa6151e312379a7381b7e65637df0429941/python/pyspark/sql/readwriter.pyc
>  in parquet(self, *paths)
> 272 [('name', 'string'), ('year', 'int'), ('month', 'int'), 
> ('day', 'int')]
> 273 """
> --> 274 return 
> self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
> 275
> 276 @ignore_unicode_prefix
> park/99f3dfa6151e312379a7381b7e65637df0429941/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py
>  in __call__(self, *args)
>1131 answer = self.gateway_client.send_command(command)
>1132 return_value = get_return_value(
> -> 1133 answer, self.gateway_client, self.target_id, self.name)
>1134
>1135 for temp_arg in temp_args:
> spark/99f3dfa6151e312379a7381b7e65637df0429941/python/pyspark/sql/utils.pyc 
> in deco(*a, **kw)
>  67  
> e.java_exception.getStackTrace()))
>  68 if s.startswith('org.apache.spark.sql.AnalysisException: 
> '):
> ---> 69 raise AnalysisException(s.split(': ', 1)[1], 
> stackTrace)
>  70 if s.startswith('org.apache.spark.sql.catalyst.analysis'):
>  71 raise AnalysisException(s.split(': ', 1)[1], 
> stackTrace)
> AnalysisException: u'Unable to infer schema for Parquet. It must be specified 
> manually.;'
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143611#comment-16143611
 ] 

Stavros Kontopoulos commented on SPARK-13041:
-

[~sowen]Haven't addressed the history uri part, should I re-open it or change 
the title and open a new issue?

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: Apache Spark

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: (was: Apache Spark)

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: (was: Apache Spark)

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21729) Generic test for ProbabilisticClassifier to ensure consistent output columns

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21729:


Assignee: Apache Spark

> Generic test for ProbabilisticClassifier to ensure consistent output columns
> 
>
> Key: SPARK-21729
> URL: https://issues.apache.org/jira/browse/SPARK-21729
> Project: Spark
>  Issue Type: Test
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> One challenge with the ProbabilisticClassifier abstraction is that it 
> introduces different code paths for predictions depending on which output 
> columns are turned on or off: probability, rawPrediction, prediction.  We ran 
> into a bug in MLOR with this.
> This task is for adding a generic test usable in all test suites for 
> ProbabilisticClassifier types which does the following:
> * Take a dataset + Estimator
> * Fit the Estimator
> * Test prediction using the model with all combinations of output columns 
> turned on/off.
> * Make sure the output column values match, presumably by comparing vs. the 
> case with all 3 output columns turned on
> CC [~WeichenXu123] since this came up in 
> https://github.com/apache/spark/pull/17373



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143620#comment-16143620
 ] 

Sean Owen commented on SPARK-13041:
---

Oh, hm, usually you'd address it all in one go. I didn't realize that wasn't 
the case. At this point I'd edit this to reflect what was changed, and follow 
up in a new issue.

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2017-08-28 Thread Shivam Dalmia (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143621#comment-16143621
 ] 

Shivam Dalmia commented on SPARK-15393:
---

Is this issue fixed through spark?
I am still facing it while reading empty parquet files; I am using spark 2.2.0 
Have also raised a bug on PARQUET for the same:
https://issues.apache.org/jira/browse/PARQUET-1080?filter=-2

> Writing empty Dataframes doesn't save any _metadata files
> -
>
> Key: SPARK-15393
> URL: https://issues.apache.org/jira/browse/SPARK-15393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jurriaan Pruis
>Priority: Critical
>
> Writing empty dataframes is broken on latest master.
> It omits the metadata and sometimes throws the following exception (when 
> saving as parquet):
> {code}
> 8-May-2016 22:37:14 WARNING: 
> org.apache.parquet.hadoop.ParquetOutputCommitter: could not write summary 
> file for file:/some/test/file
> java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456)
> at 
> org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:220)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:144)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:115)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:115)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:417)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:252)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:234)
> at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It only saves an _SUCCESS file (which is also incorrect behaviour, because it 
> raised an exception).
> This means that loading it again will result in the following error:
> {code}
> Unable to infer schema for ParquetFormat at /some/test/file. It must be 
> specified manually;'
> {code}
> It looks like this problem was introduced in 
> https://github.com/apache/spark/pull/12855 (SPARK-10216).
> After reverti

[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21801:


Assignee: (was: Apache Spark)

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143624#comment-16143624
 ] 

Apache Spark commented on SPARK-21801:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/19018

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21801:


Assignee: Apache Spark

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143626#comment-16143626
 ] 

Apache Spark commented on SPARK-21801:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/19018

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21801:


Assignee: Apache Spark

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21801:


Assignee: (was: Apache Spark)

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10216) Avoid creating empty files during overwrite into Hive table with group by query

2017-08-28 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-10216:
-
Fix Version/s: 2.3.0

> Avoid creating empty files during overwrite into Hive table with group by 
> query
> ---
>
> Key: SPARK-10216
> URL: https://issues.apache.org/jira/browse/SPARK-10216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Keuntae Park
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.3.0
>
>
> Exchange from GROUP BY query results in at least certain amount of partitions 
> specified in 'spark.sql.shuffle.partition'.
> Hence, even when the number of distinct group-by key is small, 
> INSERT INTO with GROUP BY query try to make at least 200 files (default value 
> of 'spark.sql.shuffle.partition'), 
> which results in lots of empty files.
> I think it is undesirable because upcoming queries on the resulting table 
> will also make zero size partitions and unnecessary tasks do nothing on 
> handling the queries.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2017-08-28 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143637#comment-16143637
 ] 

Hyukjin Kwon commented on SPARK-15393:
--

This issue was caused by a fix introducing a regression. That was reverted 
back. And then, another reasonable fix was merged into Spark - 
https://github.com/apache/spark/pull/18654. This fix was merged into 2.3.0. So, 
I am pretty sure you are facing a different issue with this.

I think you'd better open another JIRA with a good description and steps you 
did.

> Writing empty Dataframes doesn't save any _metadata files
> -
>
> Key: SPARK-15393
> URL: https://issues.apache.org/jira/browse/SPARK-15393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jurriaan Pruis
>Priority: Critical
>
> Writing empty dataframes is broken on latest master.
> It omits the metadata and sometimes throws the following exception (when 
> saving as parquet):
> {code}
> 8-May-2016 22:37:14 WARNING: 
> org.apache.parquet.hadoop.ParquetOutputCommitter: could not write summary 
> file for file:/some/test/file
> java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:456)
> at 
> org.apache.parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:420)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.writeMetaDataFile(ParquetOutputCommitter.java:58)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:220)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelation.scala:144)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1.apply(InsertIntoHadoopFsRelation.scala:115)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:115)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:85)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:417)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:252)
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:234)
> at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:626)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:280)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> It only saves an _SUCCESS file (which is also incorrect behaviour, because it 
> raised an exception).
> This means that loading it again will result in the following error:
> {code}
> Unable to infer schema for ParquetFormat at /some/test/file. It must be 
> specified man

[jira] [Commented] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143648#comment-16143648
 ] 

Stavros Kontopoulos commented on SPARK-13041:
-

Ok sure sorry missed that it was a long pending thing. 

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21746) nondeterministic expressions incorrectly for filter predicates

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21746:


Assignee: Apache Spark

> nondeterministic expressions incorrectly for filter predicates
> --
>
> Key: SPARK-21746
> URL: https://issues.apache.org/jira/browse/SPARK-21746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: caoxuewen
>Assignee: Apache Spark
>
> Currently, We do interpretedpredicate optimization, but not very well, 
> because when our filter contained an indeterminate expression, it would have 
> an exception. This PR describes solving this problem by adding the initialize 
> method in InterpretedPredicate.
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:415)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:158)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:157)
>   at scala.collection.immutable.Stream.filter(Stream.scala:519)
>   at scala.collection.immutable.Stream.filter(Stream.scala:202)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:157)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1129)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
>   at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:60)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21746) nondeterministic expressions incorrectly for filter predicates

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143656#comment-16143656
 ] 

Apache Spark commented on SPARK-21746:
--

User 'heary-cao' has created a pull request for this issue:
https://github.com/apache/spark/pull/18961

> nondeterministic expressions incorrectly for filter predicates
> --
>
> Key: SPARK-21746
> URL: https://issues.apache.org/jira/browse/SPARK-21746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: caoxuewen
>
> Currently, We do interpretedpredicate optimization, but not very well, 
> because when our filter contained an indeterminate expression, it would have 
> an exception. This PR describes solving this problem by adding the initialize 
> method in InterpretedPredicate.
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:415)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:158)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:157)
>   at scala.collection.immutable.Stream.filter(Stream.scala:519)
>   at scala.collection.immutable.Stream.filter(Stream.scala:202)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:157)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1129)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
>   at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:60)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21746) nondeterministic expressions incorrectly for filter predicates

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21746:


Assignee: (was: Apache Spark)

> nondeterministic expressions incorrectly for filter predicates
> --
>
> Key: SPARK-21746
> URL: https://issues.apache.org/jira/browse/SPARK-21746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: caoxuewen
>
> Currently, We do interpretedpredicate optimization, but not very well, 
> because when our filter contained an indeterminate expression, it would have 
> an exception. This PR describes solving this problem by adding the initialize 
> method in InterpretedPredicate.
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:415)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:158)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:157)
>   at scala.collection.immutable.Stream.filter(Stream.scala:519)
>   at scala.collection.immutable.Stream.filter(Stream.scala:202)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:157)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1129)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
>   at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:60)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21746) nondeterministic expressions incorrectly for filter predicates

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21746:


Assignee: (was: Apache Spark)

> nondeterministic expressions incorrectly for filter predicates
> --
>
> Key: SPARK-21746
> URL: https://issues.apache.org/jira/browse/SPARK-21746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: caoxuewen
>
> Currently, We do interpretedpredicate optimization, but not very well, 
> because when our filter contained an indeterminate expression, it would have 
> an exception. This PR describes solving this problem by adding the initialize 
> method in InterpretedPredicate.
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:415)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:158)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:157)
>   at scala.collection.immutable.Stream.filter(Stream.scala:519)
>   at scala.collection.immutable.Stream.filter(Stream.scala:202)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:157)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1129)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
>   at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:60)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21746) nondeterministic expressions incorrectly for filter predicates

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21746:


Assignee: Apache Spark

> nondeterministic expressions incorrectly for filter predicates
> --
>
> Key: SPARK-21746
> URL: https://issues.apache.org/jira/browse/SPARK-21746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: caoxuewen
>Assignee: Apache Spark
>
> Currently, We do interpretedpredicate optimization, but not very well, 
> because when our filter contained an indeterminate expression, it would have 
> an exception. This PR describes solving this problem by adding the initialize 
> method in InterpretedPredicate.
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:415)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:158)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:157)
>   at scala.collection.immutable.Stream.filter(Stream.scala:519)
>   at scala.collection.immutable.Stream.filter(Stream.scala:202)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:157)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1129)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
>   at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:60)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21746) nondeterministic expressions incorrectly for filter predicates

2017-08-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143655#comment-16143655
 ] 

Apache Spark commented on SPARK-21746:
--

User 'heary-cao' has created a pull request for this issue:
https://github.com/apache/spark/pull/18961

> nondeterministic expressions incorrectly for filter predicates
> --
>
> Key: SPARK-21746
> URL: https://issues.apache.org/jira/browse/SPARK-21746
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: caoxuewen
>
> Currently, We do interpretedpredicate optimization, but not very well, 
> because when our filter contained an indeterminate expression, it would have 
> an exception. This PR describes solving this problem by adding the initialize 
> method in InterpretedPredicate.
> java.lang.IllegalArgumentException:
> java.lang.IllegalArgumentException: requirement failed: Nondeterministic 
> expression org.apache.spark.sql.catalyst.expressions.Rand should be 
> initialized before eval.
>   at scala.Predef$.require(Predef.scala:224)
>   at 
> org.apache.spark.sql.catalyst.expressions.Nondeterministic$class.eval(Expression.scala:291)
>   at 
> org.apache.spark.sql.catalyst.expressions.RDG.eval(randomExpressions.scala:34)
>   at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:415)
>   at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate.eval(predicates.scala:38)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:158)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$$anonfun$prunePartitionsByFilter$1.apply(ExternalCatalogUtils.scala:157)
>   at scala.collection.immutable.Stream.filter(Stream.scala:519)
>   at scala.collection.immutable.Stream.filter(Stream.scala:202)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogUtils$.prunePartitionsByFilter(ExternalCatalogUtils.scala:157)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1129)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1119)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:925)
>   at 
> org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:60)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:27)
>   at 
> org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$.apply(PruneFileSourcePartitions.scala:26)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143648#comment-16143648
 ] 

Stavros Kontopoulos edited comment on SPARK-13041 at 8/28/17 11:16 AM:
---

I think history column is already there from what I see in the code.


was (Author: skonto):
Ok sure sorry missed that it was a long pending thing. 

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143648#comment-16143648
 ] 

Stavros Kontopoulos edited comment on SPARK-13041 at 8/28/17 11:17 AM:
---

[~sowen] 
False alarm this is fixed here: 
https://issues.apache.org/jira/browse/SPARK-16809


was (Author: skonto):
I think history column is already there from what I see in the code.

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143648#comment-16143648
 ] 

Stavros Kontopoulos edited comment on SPARK-13041 at 8/28/17 11:18 AM:
---

[~sowen] False alarm this is fixed here: 
https://issues.apache.org/jira/browse/SPARK-16809


was (Author: skonto):
[~sowen] 
False alarm this is fixed here: 
https://issues.apache.org/jira/browse/SPARK-16809

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-13041) Add a driver history ui link and a mesos sandbox link on the dispatcher's ui page for each driver

2017-08-28 Thread Stavros Kontopoulos (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143648#comment-16143648
 ] 

Stavros Kontopoulos edited comment on SPARK-13041 at 8/28/17 11:25 AM:
---

[~sowen] False alarm this is fixed here: 
https://issues.apache.org/jira/browse/SPARK-16809 (related task)


was (Author: skonto):
[~sowen] False alarm this is fixed here: 
https://issues.apache.org/jira/browse/SPARK-16809

> Add a driver history ui link and a mesos sandbox link on the dispatcher's ui 
> page for each driver
> -
>
> Key: SPARK-13041
> URL: https://issues.apache.org/jira/browse/SPARK-13041
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.3.0
>
>
> It would be convenient to have the driver's history uri from the history 
> server and the driver's mesos sandbox uri on the dispatcher's ui.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21838) "Completed Applications" links not always working in cluster with spark.ui.reverseProxy=true

2017-08-28 Thread Ingo Schuster (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143718#comment-16143718
 ] 

Ingo Schuster commented on SPARK-21838:
---

You are right, on the latest master it behaves differently. Instead of the 
spark cluster home page, you just get a empty (white) page.
This is true for all urls that the proxy cannot serve: it used to just serve 
the cluster home page and now you get an empty page.

This new behaviour prevents the problem I reported above. It probably would be 
nicer to display some information saying that there is no content to display 
and offering a link back to the cluster home.

> "Completed Applications" links not always working in cluster with 
> spark.ui.reverseProxy=true
> 
>
> Key: SPARK-21838
> URL: https://issues.apache.org/jira/browse/SPARK-21838
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.1.0, 2.2.0
> Environment: Spark Cluster with reverse proxy enabled:
> spark.ui.reverseProxyUrl=http://127.0.1.1:8080/
> spark.ui.reverseProxy=true
>Reporter: Ingo Schuster
>
> 1. Using a Spark Cluster with reverse Proxy enabled, web UI is at: 
> http://127.0.0.1:8080/
> 2, Starting an application and enter the application specific web UI (by 
> clicking on the application name): 
> http://127.0.0.1:8080/proxy/app-20170825151733-0001/jobs/
> 3. If you click on any link (e.g. on the "Executors") +after the application 
> has terminated+, the Spark master UI will be served again, howeverthis is 
> done under the URL of the executors page:  
> http://127.0.0.1:8080/proxy/app-20170825151733-0001/executors/
> 4. When you now click on the link to the just completed application, nothing 
> happens (you stay on the master web UI home page): 
> http://127.0.0.1:8080/proxy/app-20170825151733-0001/executors/app?appId=app-20170825151733-0001
> Problem is that in step 3, we cannot serve the applications executors page 
> since it already terminated. Falling back to the master web UI home page is 
> ok, but it should be done with an http redirect to that the relative URLs on 
> the master home page are build with the correct base URL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21801:


Assignee: (was: Apache Spark)

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21801) SparkR unit test randomly fail on trees

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21801:


Assignee: Apache Spark

> SparkR unit test randomly fail on trees
> ---
>
> Key: SPARK-21801
> URL: https://issues.apache.org/jira/browse/SPARK-21801
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Tests
>Affects Versions: 2.2.0
>Reporter: Weichen Xu
>Assignee: Apache Spark
>Priority: Critical
>
> SparkR unit test sometimes will randomly occur such error:
> ```
> 1. Error: spark.randomForest (@test_mllib_tree.R#236) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_87ea3065aeb2 should have at least two distinct values.
> ```
> or
> ```
> 1. Error: spark.decisionTree (@test_mllib_tree.R#353) 
> --
> java.lang.IllegalArgumentException: requirement failed: The input column 
> stridx_d6a0b492cfa1 should have at least two distinct values.
> ```



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19547) KafkaUtil throw 'No current assignment for partition' Exception

2017-08-28 Thread Tao Tian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143760#comment-16143760
 ] 

Tao Tian commented on SPARK-19547:
--

I have encounter this problem too. using Kafka 0.10.0 and spark-streaming_2.11. 
The error log is "Caused by: 
org.apache.kafka.common.errors.RecordTooLargeException: There are some messages 
at [Partition=Offset]: {db4.CLAC.result-0=61638808} whose size is larger than 
the fetch size 1048576 and hence cannot be ever returned. Increase the fetch 
size on the client (using max.partition.fetch.bytes), or decrease the maximum 
message size the broker will allow (using message.max.bytes).
"  
I set  "max.partition.fetch.bytes = 10485720" for Kafka has fixed the problem.

> KafkaUtil throw 'No current assignment for partition' Exception
> ---
>
> Key: SPARK-19547
> URL: https://issues.apache.org/jira/browse/SPARK-19547
> Project: Spark
>  Issue Type: Question
>  Components: DStreams
>Affects Versions: 1.6.1
>Reporter: wuchang
>
> Below is my scala code to create spark kafka stream:
> val kafkaParams = Map[String, Object](
>   "bootstrap.servers" -> "server110:2181,server110:9092",
>   "zookeeper" -> "server110:2181",
>   "key.deserializer" -> classOf[StringDeserializer],
>   "value.deserializer" -> classOf[StringDeserializer],
>   "group.id" -> "example",
>   "auto.offset.reset" -> "latest",
>   "enable.auto.commit" -> (false: java.lang.Boolean)
> )
> val topics = Array("ABTest")
> val stream = KafkaUtils.createDirectStream[String, String](
>   ssc,
>   PreferConsistent,
>   Subscribe[String, String](topics, kafkaParams)
> )
> But after run for 10 hours, it throws exceptions:
> 2017-02-10 10:56:20,000 INFO  [JobGenerator] internals.ConsumerCoordinator: 
> Revoking previously assigned partitions [ABTest-0, ABTest-1] for group example
> 2017-02-10 10:56:20,000 INFO  [JobGenerator] internals.AbstractCoordinator: 
> (Re-)joining group example
> 2017-02-10 10:56:20,011 INFO  [JobGenerator] internals.AbstractCoordinator: 
> (Re-)joining group example
> 2017-02-10 10:56:40,057 INFO  [JobGenerator] internals.AbstractCoordinator: 
> Successfully joined group example with generation 5
> 2017-02-10 10:56:40,058 INFO  [JobGenerator] internals.ConsumerCoordinator: 
> Setting newly assigned partitions [ABTest-1] for group example
> 2017-02-10 10:56:40,080 ERROR [JobScheduler] scheduler.JobScheduler: Error 
> generating jobs for time 148669538 ms
> java.lang.IllegalStateException: No current assignment for partition ABTest-0
> at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:231)
> at 
> org.apache.kafka.clients.consumer.internals.SubscriptionState.needOffsetReset(SubscriptionState.java:295)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.seekToEnd(KafkaConsumer.java:1169)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:179)
> at 
> org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:196)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
> at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
> at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
> at scala.Option.orElse(Option.scala:289)
> at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:48)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:117)
> at 
> org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:116)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(Resi

[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: (was: Apache Spark)

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21839) Support SQL config for ORC compression

2017-08-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21839:


Assignee: Apache Spark

> Support SQL config for ORC compression 
> ---
>
> Key: SPARK-21839
> URL: https://issues.apache.org/jira/browse/SPARK-21839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> This issue aims to provide `spark.sql.orc.compression.codec` like 
> `spark.sql.parquet.compression.codec`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   >