[jira] [Resolved] (SPARK-20537) OffHeapColumnVector reallocation may not copy existing data

2017-05-01 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-20537.
-
   Resolution: Fixed
 Assignee: Kazuaki Ishizaki
Fix Version/s: 2.3.0
   2.2.1

> OffHeapColumnVector reallocation may not copy existing data
> ---
>
> Key: SPARK-20537
> URL: https://issues.apache.org/jira/browse/SPARK-20537
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Kazuaki Ishizaki
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.1, 2.3.0
>
>
> As SPARK-20474 revealed, reallocation in {{OnHeapColumnVector}} may copy a 
> part of the original storage.
> {{OffHeapColumnVector}} reallocation also copies to the new storage data up 
> to {{elementsAppended}}. This variable is only updated when using the 
> ColumnVector.appendX API, while ColumnVector.putX is more commonly used.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20526) Load doesn't work in PCAModel

2017-05-01 Thread 颜发才

[ 
https://issues.apache.org/jira/browse/SPARK-20526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992367#comment-15992367
 ] 

Yan Facai (颜发才) commented on SPARK-20526:
-

Can you give a sample code?

> Load doesn't work in PCAModel 
> --
>
> Key: SPARK-20526
> URL: https://issues.apache.org/jira/browse/SPARK-20526
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib
>Affects Versions: 2.1.0
> Environment: Windows
>Reporter: Hayri Volkan Agun
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Error occurs during loading PCAModel. Saved model doesn't load.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2017-05-01 Thread arpit bhatnagar (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992363#comment-15992363
 ] 

arpit  bhatnagar commented on SPARK-12216:
--

is there any work around for this issue or it is solved, because i am getting 
the same issue and i am stuck.

17/05/02 11:07:13 ERROR ShutdownHookManager: Exception while deleting Spark 
temp dir: 
C:\Users\arpitbh\AppData\Local\Temp\spark-07d9637a-2eb8-4a32-8490-01e106a80d6b
java.io.IOException: Failed to delete: 
C:\Users\arpitbh\AppData\Local\Temp\spark-07d9637a-2eb8-4a32-8490-01e106a80d6b
at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:65)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:62)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at 
org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:62)
at 
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1951)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
at scala.util.Try$.apply(Try.scala:192)
at 
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
at 
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)



Please help

> Spark failed to delete temp directory 
> --
>
> Key: SPARK-12216
> URL: https://issues.apache.org/jira/browse/SPARK-12216
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
> Environment: windows 7 64 bit
> Spark 1.52
> Java 1.8.0.65
> PATH includes:
> C:\Users\Stefan\spark-1.5.2-bin-hadoop2.6\bin
> C:\ProgramData\Oracle\Java\javapath
> C:\Users\Stefan\scala\bin
> SYSTEM variables set are:
> JAVA_HOME=C:\Program Files\Java\jre1.8.0_65
> HADOOP_HOME=C:\Users\Stefan\hadoop-2.6.0\bin
> (where the bin\winutils resides)
> both \tmp and \tmp\hive have permissions
> drwxrwxrwx as detected by winutils ls
>Reporter: stefan
>Priority: Minor
>
> The mailing list archives have no obvious solution to this:
> scala> :q
> Stopping spark context.
> 15/12/08 16:24:22 ERROR ShutdownHookManager: Exception while deleting Spark 
> temp dir: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> java.io.IOException: Failed to delete: 
> C:\Users\Stefan\AppData\Local\Temp\spark-18f2a418-e02f-458b-8325-60642868fdff
> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:884)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:63)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1$$anonfun$apply$mcV$sp$3.apply(ShutdownHookManager.scala:60)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at 
> org.apache.spark.util.ShutdownHookManager$$anonfun$1.apply$mcV$sp(ShutdownHookManager.scala:60)
> at 
> org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:264)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:234)
> at 
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699)
> at 
> org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:234)
> at 
> 

[jira] [Commented] (SPARK-20047) Constrained Logistic Regression

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992330#comment-15992330
 ] 

Apache Spark commented on SPARK-20047:
--

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/17829

> Constrained Logistic Regression
> ---
>
> Key: SPARK-20047
> URL: https://issues.apache.org/jira/browse/SPARK-20047
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.2.0
>Reporter: DB Tsai
>Assignee: Yanbo Liang
> Fix For: 2.2.1
>
>
> For certain applications, such as stacked regressions, it is important to put 
> non-negative constraints on the regression coefficients. Also, if the ranges 
> of coefficients are known, it makes sense to constrain the coefficient search 
> space.
> Fitting generalized constrained regression models object to Cβ ≤ b, where C ∈ 
> R^\{m×p\} and b ∈ R^\{m\} are predefined matrices and vectors which places a 
> set of m linear constraints on the coefficients is very challenging as 
> discussed in many literatures. 
> However, for box constraints on the coefficients, the optimization is well 
> solved. For gradient descent, people can projected gradient descent in the 
> primal by zeroing the negative weights at each step. For LBFGS, an extended 
> version of it, LBFGS-B can handle large scale box optimization efficiently. 
> Unfortunately, for OWLQN, there is no good efficient way to do optimization 
> with box constrains.
> As a result, in this work, we only implement constrained LR with box 
> constrains without L1 regularization. 
> Note that since we standardize the data in training phase, so the 
> coefficients seen in the optimization subroutine are in the scaled space; as 
> a result, we need to convert the box constrains into scaled space.
> Users will be able to set the lower / upper bounds of each coefficients and 
> intercepts.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20490) Add eqNullSafe, not and ! to SparkR

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992328#comment-15992328
 ] 

Apache Spark commented on SPARK-20490:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17828

> Add eqNullSafe, not and ! to SparkR
> ---
>
> Key: SPARK-20490
> URL: https://issues.apache.org/jira/browse/SPARK-20490
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
> Fix For: 2.3.0
>
>
> Add {{o.a.s.sql.functions.not}}, {{o.a.s.sql.Column.!}} and 
> {{o.a.s.sql.Column.eqNullSafe}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20532) SparkR should provide grouping and grouping_id

2017-05-01 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20532.
--
  Resolution: Fixed
Assignee: Maciej Szymkiewicz
   Fix Version/s: 2.3.0
Target Version/s: 2.3.0

> SparkR should provide grouping and grouping_id
> --
>
> Key: SPARK-20532
> URL: https://issues.apache.org/jira/browse/SPARK-20532
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 2.3.0
>
>
> SparkR should provide wrappers for {{o.a.s.sql.functions.grouping}} and 
> {{o.a.s.sql.functions.grouping_id}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20192) SparkR 2.2.0 migration guide, release note

2017-05-01 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20192.
--
  Resolution: Fixed
   Fix Version/s: 2.3.0
  2.2.0
Target Version/s: 2.2.0, 2.3.0  (was: 2.2.0)

> SparkR 2.2.0 migration guide, release note
> --
>
> Key: SPARK-20192
> URL: https://issues.apache.org/jira/browse/SPARK-20192
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.0, 2.3.0
>
>
> From looking at changes since 2.1.0, this/these should be documented in the 
> migration guide / release note for the 2.2.0 release, as it is behavior 
> changes
> https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
> https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20192) SparkR 2.2.0 migration guide, release note

2017-05-01 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992306#comment-15992306
 ] 

Felix Cheung commented on SPARK-20192:
--

SPARK-20513 for release note instead

> SparkR 2.2.0 migration guide, release note
> --
>
> Key: SPARK-20192
> URL: https://issues.apache.org/jira/browse/SPARK-20192
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.0, 2.3.0
>
>
> From looking at changes since 2.1.0, this/these should be documented in the 
> migration guide / release note for the 2.2.0 release, as it is behavior 
> changes
> https://github.com/apache/spark/commit/422aa67d1bb84f913b06e6d94615adb6557e2870
> https://github.com/apache/spark/pull/17483 (createExternalTable)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20552) Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20552:


Assignee: (was: Apache Spark)

>  Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python
> --
>
> Key: SPARK-20552
> URL: https://issues.apache.org/jira/browse/SPARK-20552
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> After SPARK-20463, we are able to use {{IS [NOT] DISTINCT FROM}} in Spark SQL.
> It looks we should add {{isNotDistinctFrom}} (as an alias for {{eqNullSafe}}) 
> and {{isDistinctFrom}} (for a negated {{eqNullSafe}}) in both Scala/Java and 
> Python in Column APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20552) Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20552:


Assignee: Apache Spark

>  Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python
> --
>
> Key: SPARK-20552
> URL: https://issues.apache.org/jira/browse/SPARK-20552
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> After SPARK-20463, we are able to use {{IS [NOT] DISTINCT FROM}} in Spark SQL.
> It looks we should add {{isNotDistinctFrom}} (as an alias for {{eqNullSafe}}) 
> and {{isDistinctFrom}} (for a negated {{eqNullSafe}}) in both Scala/Java and 
> Python in Column APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20552) Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992295#comment-15992295
 ] 

Apache Spark commented on SPARK-20552:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/17827

>  Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python
> --
>
> Key: SPARK-20552
> URL: https://issues.apache.org/jira/browse/SPARK-20552
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.2.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> After SPARK-20463, we are able to use {{IS [NOT] DISTINCT FROM}} in Spark SQL.
> It looks we should add {{isNotDistinctFrom}} (as an alias for {{eqNullSafe}}) 
> and {{isDistinctFrom}} (for a negated {{eqNullSafe}}) in both Scala/Java and 
> Python in Column APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-05-01 Thread Kamal Gurala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992255#comment-15992255
 ] 

Kamal Gurala commented on SPARK-4899:
-

Can one of the admins verify this patch?


> Support Mesos features: roles and checkpoints
> -
>
> Key: SPARK-4899
> URL: https://issues.apache.org/jira/browse/SPARK-4899
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Andrew Ash
>
> Inspired by https://github.com/apache/spark/pull/60
> Mesos has two features that would be nice for Spark to take advantage of:
> 1. Roles -- a way to specify ACLs and priorities for users
> 2. Checkpoints -- a way to restart a failed Mesos slave without losing all 
> the work that was happening on the box
> Some of these may require a Mesos upgrade past our current 0.18.1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-4899) Support Mesos features: roles and checkpoints

2017-05-01 Thread Kamal Gurala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Gurala updated SPARK-4899:

Comment: was deleted

(was: Can one of the admins verify this patch?
)

> Support Mesos features: roles and checkpoints
> -
>
> Key: SPARK-4899
> URL: https://issues.apache.org/jira/browse/SPARK-4899
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Affects Versions: 1.2.0
>Reporter: Andrew Ash
>
> Inspired by https://github.com/apache/spark/pull/60
> Mesos has two features that would be nice for Spark to take advantage of:
> 1. Roles -- a way to specify ACLs and priorities for users
> 2. Checkpoints -- a way to restart a failed Mesos slave without losing all 
> the work that was happening on the box
> Some of these may require a Mesos upgrade past our current 0.18.1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-20419) Support for Mesos Maintenance primitives

2017-05-01 Thread Kamal Gurala (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Gurala updated SPARK-20419:
-
Comment: was deleted

(was: Can one of the admins verify this patch?)

> Support for Mesos Maintenance primitives
> 
>
> Key: SPARK-20419
> URL: https://issues.apache.org/jira/browse/SPARK-20419
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 2.1.0
>Reporter: Kamal Gurala
>Priority: Minor
>
> With Mesos 0.25.0, maintenance primitives have been added.
> https://mesos.apache.org/documentation/latest/maintenance/
> Based on the documentation it appears frameworks can be maintenance aware.
> Spark should be able respect mesos concepts of maintenance modes or inverse 
> offers or unavailability and not schedule tasks on resources that will go 
> under maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20419) Support for Mesos Maintenance primitives

2017-05-01 Thread Kamal Gurala (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15992229#comment-15992229
 ] 

Kamal Gurala commented on SPARK-20419:
--

Can one of the admins verify this patch?

> Support for Mesos Maintenance primitives
> 
>
> Key: SPARK-20419
> URL: https://issues.apache.org/jira/browse/SPARK-20419
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos, Scheduler
>Affects Versions: 2.1.0
>Reporter: Kamal Gurala
>Priority: Minor
>
> With Mesos 0.25.0, maintenance primitives have been added.
> https://mesos.apache.org/documentation/latest/maintenance/
> Based on the documentation it appears frameworks can be maintenance aware.
> Spark should be able respect mesos concepts of maintenance modes or inverse 
> offers or unavailability and not schedule tasks on resources that will go 
> under maintenance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17827) StatisticsColumnSuite failures on big endian platforms

2017-05-01 Thread Zhenhua Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhenhua Wang updated SPARK-17827:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-16026

> StatisticsColumnSuite failures on big endian platforms
> --
>
> Key: SPARK-17827
> URL: https://issues.apache.org/jira/browse/SPARK-17827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.1.0
> Environment: big endian
>Reporter: Pete Robbins
>Assignee: Pete Robbins
>  Labels: big-endian
> Fix For: 2.1.0
>
>
> https://issues.apache.org/jira/browse/SPARK-17073
> introduces new tests/function that fails on big endian platforms
> Failing tests:
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> string column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> binary column
>  org.apache.spark.sql.StatisticsColumnSuite.column-level statistics for 
> columns with different types
>  org.apache.spark.sql.hive.StatisticsSuite.generate column-level statistics 
> and load them from hive metastore
> all fail in checkColStat eg: 
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.StatisticsTest$.checkColStat(StatisticsTest.scala:92)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:43)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1$$anonfun$apply$mcV$sp$1.apply(StatisticsTest.scala:40)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> org.apache.spark.sql.StatisticsTest$$anonfun$checkColStats$1.apply$mcV$sp(StatisticsTest.scala:40)
>   at 
> org.apache.spark.sql.test.SQLTestUtils$class.withTable(SQLTestUtils.scala:168)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.withTable(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsTest$class.checkColStats(StatisticsTest.scala:33)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite.checkColStats(StatisticsColumnSuite.scala:30)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply$mcV$sp(StatisticsColumnSuite.scala:171)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)
>   at 
> org.apache.spark.sql.StatisticsColumnSuite$$anonfun$7.apply(StatisticsColumnSuite.scala:160)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991976#comment-15991976
 ] 

Herman van Hovell commented on SPARK-20548:
---

I jumped the gun on this one, the test has been disabled. This has not been 
fixed yet.

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
> Fix For: 2.2.0
>
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell reopened SPARK-20548:
---

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
> Fix For: 2.2.0
>
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20552) Add isNotDistinctFrom/isDistinctFrom for column APIs in Scala/Java and Python

2017-05-01 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-20552:


 Summary:  Add isNotDistinctFrom/isDistinctFrom for column APIs in 
Scala/Java and Python
 Key: SPARK-20552
 URL: https://issues.apache.org/jira/browse/SPARK-20552
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 2.2.0
Reporter: Hyukjin Kwon
Priority: Minor


After SPARK-20463, we are able to use {{IS [NOT] DISTINCT FROM}} in Spark SQL.

It looks we should add {{isNotDistinctFrom}} (as an alias for {{eqNullSafe}}) 
and {{isDistinctFrom}} (for a negated {{eqNullSafe}}) in both Scala/Java and 
Python in Column APIs.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2017-05-01 Thread Jo Desmet (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991945#comment-15991945
 ] 

Jo Desmet commented on SPARK-15343:
---

So the issue is fixed by moving to at least Hadoop Yarn Version 2.8.0 as per 
[YARN-5271 |https://issues.apache.org/jira/browse/YARN-5271].

> NoClassDefFoundError when initializing Spark with YARN
> --
>
> Key: SPARK-15343
> URL: https://issues.apache.org/jira/browse/SPARK-15343
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.0.0
>Reporter: Maciej Bryński
>Priority: Critical
>
> I'm trying to connect Spark 2.0 (compiled from branch-2.0) with Hadoop.
> Spark compiled with:
> {code}
> ./dev/make-distribution.sh -Pyarn -Phadoop-2.6 -Phive -Phive-thriftserver 
> -Dhadoop.version=2.6.0 -DskipTests
> {code}
> I'm getting following error
> {code}
> mbrynski@jupyter:~/spark$ bin/pyspark
> Python 3.4.0 (default, Apr 11 2014, 13:05:11)
> [GCC 4.8.2] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" 
> with specified deploy mode instead.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel).
> 16/05/16 11:54:41 WARN SparkConf: The configuration key 'spark.yarn.jar' has 
> been deprecated as of Spark 2.0 and may be removed in the future. Please use 
> the new key 'spark.yarn.jars' instead.
> 16/05/16 11:54:41 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> 16/05/16 11:54:42 WARN AbstractHandler: No Server set for 
> org.spark_project.jetty.server.handler.ErrorHandler@f7989f6
> 16/05/16 11:54:43 WARN DomainSocketFactory: The short-circuit local reads 
> feature cannot be used because libhadoop cannot be loaded.
> Traceback (most recent call last):
>   File "/home/mbrynski/spark/python/pyspark/shell.py", line 38, in 
> sc = SparkContext()
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 115, in __init__
> conf, jsc, profiler_cls)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 172, in _do_init
> self._jsc = jsc or self._initialize_context(self._conf._jconf)
>   File "/home/mbrynski/spark/python/pyspark/context.py", line 235, in 
> _initialize_context
> return self._jvm.JavaSparkContext(jconf)
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", 
> line 1183, in __call__
>   File 
> "/home/mbrynski/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 
> 312, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> None.org.apache.spark.api.java.JavaSparkContext.
> : java.lang.NoClassDefFoundError: 
> com/sun/jersey/api/client/config/ClientConfig
> at 
> org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:45)
> at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:163)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:150)
> at 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
> at 
> org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:148)
> at org.apache.spark.SparkContext.(SparkContext.scala:502)
> at 
> org.apache.spark.api.java.JavaSparkContext.(JavaSparkContext.scala:58)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:236)
> at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
> at 
> py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
> at py4j.GatewayConnection.run(GatewayConnection.java:211)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException: 
> com.sun.jersey.api.client.config.ClientConfig
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
>   

[jira] [Commented] (SPARK-20551) ImportError adding custom class from jar in pyspark

2017-05-01 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991929#comment-15991929
 ] 

Hyukjin Kwon commented on SPARK-20551:
--

It sounds you are importing jars in Python without Py4J support. I tested to 
import custom Python package and it looks working okay in my local. It sounds 
like a rather question to me.

> ImportError adding custom class from jar in pyspark
> ---
>
> Key: SPARK-20551
> URL: https://issues.apache.org/jira/browse/SPARK-20551
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Shell
>Affects Versions: 2.1.0
>Reporter: Sergio Monteiro
>
> the flowwing imports are failing in PySpark, even when I set the --jars or 
> --driver-class-path:
> import net.ripe.hadoop.pcap.io.PcapInputFormat
> import net.ripe.hadoop.pcap.io.CombinePcapInputFormat
> import net.ripe.hadoop.pcap.packet.Packet
> Using Python version 2.7.12 (default, Nov 19 2016 06:48:10)
> SparkSession available as 'spark'.
> >>> import net.ripe.hadoop.pcap.io.PcapInputFormat
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named net.ripe.hadoop.pcap.io.PcapInputFormat
> >>> import net.ripe.hadoop.pcap.io.CombinePcapInputFormat
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named net.ripe.hadoop.pcap.io.CombinePcapInputFormat
> >>> import net.ripe.hadoop.pcap.packet.Packet
> Traceback (most recent call last):
>   File "", line 1, in 
> ImportError: No module named net.ripe.hadoop.pcap.packet.Packet
> >>>
> The same works great in spark-shell. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Herman van Hovell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-20548.
---
   Resolution: Fixed
 Assignee: Sameer Agarwal
Fix Version/s: 2.2.0

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>Assignee: Sameer Agarwal
> Fix For: 2.2.0
>
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20551) ImportError adding custom class from jar in pyspark

2017-05-01 Thread Sergio Monteiro (JIRA)
Sergio Monteiro created SPARK-20551:
---

 Summary: ImportError adding custom class from jar in pyspark
 Key: SPARK-20551
 URL: https://issues.apache.org/jira/browse/SPARK-20551
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Shell
Affects Versions: 2.1.0
Reporter: Sergio Monteiro


the flowwing imports are failing in PySpark, even when I set the --jars or 
--driver-class-path:

import net.ripe.hadoop.pcap.io.PcapInputFormat
import net.ripe.hadoop.pcap.io.CombinePcapInputFormat
import net.ripe.hadoop.pcap.packet.Packet

Using Python version 2.7.12 (default, Nov 19 2016 06:48:10)
SparkSession available as 'spark'.
>>> import net.ripe.hadoop.pcap.io.PcapInputFormat
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named net.ripe.hadoop.pcap.io.PcapInputFormat
>>> import net.ripe.hadoop.pcap.io.CombinePcapInputFormat
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named net.ripe.hadoop.pcap.io.CombinePcapInputFormat
>>> import net.ripe.hadoop.pcap.packet.Packet
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named net.ripe.hadoop.pcap.packet.Packet
>>>

The same works great in spark-shell. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20463) Add support for IS [NOT] DISTINCT FROM to SPARK SQL

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-20463:
---

Assignee: Michael Styles

> Add support for IS [NOT] DISTINCT FROM to SPARK SQL
> ---
>
> Key: SPARK-20463
> URL: https://issues.apache.org/jira/browse/SPARK-20463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Michael Styles
>Assignee: Michael Styles
> Fix For: 2.3.0
>
>
> Add support for the SQL standard distinct predicate to SPARK SQL.
> {noformat}
>  IS [NOT] DISTINCT FROM 
> {noformat}
> {noformat}
> data = [(10, 20), (30, 30), (40, None), (None, None)]
> df = sc.parallelize(data).toDF(["c1", "c2"])
> df.createTempView("df")
> spark.sql("select c1, c2 from df where c1 is not distinct from c2").collect()
> [Row(c1=30, c2=30), Row(c1=None, c2=None)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20463) Add support for IS [NOT] DISTINCT FROM to SPARK SQL

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20463.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> Add support for IS [NOT] DISTINCT FROM to SPARK SQL
> ---
>
> Key: SPARK-20463
> URL: https://issues.apache.org/jira/browse/SPARK-20463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Michael Styles
>Assignee: Michael Styles
> Fix For: 2.3.0
>
>
> Add support for the SQL standard distinct predicate to SPARK SQL.
> {noformat}
>  IS [NOT] DISTINCT FROM 
> {noformat}
> {noformat}
> data = [(10, 20), (30, 30), (40, None), (None, None)]
> df = sc.parallelize(data).toDF(["c1", "c2"])
> df.createTempView("df")
> spark.sql("select c1, c2 from df where c1 is not distinct from c2").collect()
> [Row(c1=30, c2=30), Row(c1=None, c2=None)]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20459) JdbcUtils throws IllegalStateException: Cause already initialized after getting SQLException

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20459.
-
   Resolution: Fixed
Fix Version/s: 2.3.0
   2.2.1

> JdbcUtils throws IllegalStateException: Cause already initialized after 
> getting SQLException
> 
>
> Key: SPARK-20459
> URL: https://issues.apache.org/jira/browse/SPARK-20459
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Jessie Yu
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> Testing some failure scenarios, and JdbcUtils throws an IllegalStateException 
> instead of the expected SQLException:
> {code}
> scala> 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(prodtbl,url3,"DB2.D_ITEM_INFO",prop1)
>  
> 17/04/03 17:19:35 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)  
>   
> java.lang.IllegalStateException: Cause already initialized
>   
> .at java.lang.Throwable.setCause(Throwable.java:365)  
>   
> .at java.lang.Throwable.initCause(Throwable.java:341) 
>   
> .at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:241)
> .at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:300)
> .at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:299)
> .at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)
> .at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)
> .at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
>  
> .at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
>   
> .at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   
> .at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   
> .at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> .at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153
> .at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628
> .at java.lang.Thread.run(Thread.java:785) 
>   
> {code}
> The code in JdbcUtils.savePartition has 
> {code}
> } catch {
>   case e: SQLException =>
> val cause = e.getNextException
> if (cause != null && e.getCause != cause) {
>   if (e.getCause == null) {
> e.initCause(cause)
>   } else {
> e.addSuppressed(cause)
>   }
> }
> {code}
> According to Throwable Java doc, {{initCause()}} throws an 
> {{IllegalStateException}} "if this throwable was created with 
> Throwable(Throwable) or Throwable(String,Throwable), or this method has 
> already been called on this throwable". The code does check whether {{cause}} 
> is {{null}} before initializing it. However, {{getCause()}} "returns the 
> cause of this throwable or null if the cause is nonexistent or unknown." In 
> other words, {{null}} is returned if {{cause}} already exists (which would 
> result in {{IllegalStateException}}) but is unknown. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20459) JdbcUtils throws IllegalStateException: Cause already initialized after getting SQLException

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-20459:
---

Assignee: Sean Owen

> JdbcUtils throws IllegalStateException: Cause already initialized after 
> getting SQLException
> 
>
> Key: SPARK-20459
> URL: https://issues.apache.org/jira/browse/SPARK-20459
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Jessie Yu
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 2.2.1, 2.3.0
>
>
> Testing some failure scenarios, and JdbcUtils throws an IllegalStateException 
> instead of the expected SQLException:
> {code}
> scala> 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils.saveTable(prodtbl,url3,"DB2.D_ITEM_INFO",prop1)
>  
> 17/04/03 17:19:35 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)  
>   
> java.lang.IllegalStateException: Cause already initialized
>   
> .at java.lang.Throwable.setCause(Throwable.java:365)  
>   
> .at java.lang.Throwable.initCause(Throwable.java:341) 
>   
> .at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.savePartition(JdbcUtils.scala:241)
> .at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:300)
> .at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$saveTable$1.apply(JdbcUtils.scala:299)
> .at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)
> .at 
> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:902)
> .at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
>  
> .at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1899)
>   
> .at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   
> .at org.apache.spark.scheduler.Task.run(Task.scala:86)
>   
> .at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) 
> .at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1153
> .at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628
> .at java.lang.Thread.run(Thread.java:785) 
>   
> {code}
> The code in JdbcUtils.savePartition has 
> {code}
> } catch {
>   case e: SQLException =>
> val cause = e.getNextException
> if (cause != null && e.getCause != cause) {
>   if (e.getCause == null) {
> e.initCause(cause)
>   } else {
> e.addSuppressed(cause)
>   }
> }
> {code}
> According to Throwable Java doc, {{initCause()}} throws an 
> {{IllegalStateException}} "if this throwable was created with 
> Throwable(Throwable) or Throwable(String,Throwable), or this method has 
> already been called on this throwable". The code does check whether {{cause}} 
> is {{null}} before initializing it. However, {{getCause()}} "returns the 
> cause of this throwable or null if the cause is nonexistent or unknown." In 
> other words, {{null}} is returned if {{cause}} already exists (which would 
> result in {{IllegalStateException}}) but is unknown. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20549) java.io.CharConversionException: Invalid UTF-32 in JsonToStructs

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20549:


Assignee: Apache Spark

> java.io.CharConversionException: Invalid UTF-32 in JsonToStructs
> 
>
> Key: SPARK-20549
> URL: https://issues.apache.org/jira/browse/SPARK-20549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Burak Yavuz
>Assignee: Apache Spark
>
> The same fix for SPARK-16548 needs to be applied for JsonToStructs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20549) java.io.CharConversionException: Invalid UTF-32 in JsonToStructs

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991809#comment-15991809
 ] 

Apache Spark commented on SPARK-20549:
--

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/17826

> java.io.CharConversionException: Invalid UTF-32 in JsonToStructs
> 
>
> Key: SPARK-20549
> URL: https://issues.apache.org/jira/browse/SPARK-20549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Burak Yavuz
>
> The same fix for SPARK-16548 needs to be applied for JsonToStructs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20549) java.io.CharConversionException: Invalid UTF-32 in JsonToStructs

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20549:


Assignee: (was: Apache Spark)

> java.io.CharConversionException: Invalid UTF-32 in JsonToStructs
> 
>
> Key: SPARK-20549
> URL: https://issues.apache.org/jira/browse/SPARK-20549
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Burak Yavuz
>
> The same fix for SPARK-16548 needs to be applied for JsonToStructs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-20547:
-
Component/s: (was: Spark Core)
 Spark Shell

> ExecutorClassLoader's findClass may not work correctly when a task is 
> cancelled.
> 
>
> Key: SPARK-20547
> URL: https://issues.apache.org/jira/browse/SPARK-20547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> ExecutorClassLoader's findClass may throw some transient exception. For 
> example, when a task is cancelled, if ExecutorClassLoader is running, you may 
> see InterruptedException or IOException, even if this class can be loaded. 
> Then the result of findClass will be cached by JVM, and later when the same 
> class is being loaded (note: in this case, this class may be still loadable), 
> it will just throw NoClassDefFoundError.
> We should make ExecutorClassLoader retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991794#comment-15991794
 ] 

Shixiong Zhu edited comment on SPARK-20547 at 5/1/17 11:37 PM:
---

This is a reproducer: 
https://github.com/zsxwing/spark/commit/993b6d24d935cf489fdf643ef4266c239028d6e4


was (Author: zsxwing):
This is a reproducer: 
https://github.com/apache/spark/compare/master...zsxwing:SPARK-20547?expand=1

> ExecutorClassLoader's findClass may not work correctly when a task is 
> cancelled.
> 
>
> Key: SPARK-20547
> URL: https://issues.apache.org/jira/browse/SPARK-20547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> ExecutorClassLoader's findClass may throw some transient exception. For 
> example, when a task is cancelled, if ExecutorClassLoader is running, you may 
> see InterruptedException or IOException, even if this class can be loaded. 
> Then the result of findClass will be cached by JVM, and later when the same 
> class is being loaded (note: in this case, this class may be still loadable), 
> it will just throw NoClassDefFoundError.
> We should make ExecutorClassLoader retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991794#comment-15991794
 ] 

Shixiong Zhu commented on SPARK-20547:
--

This is a reproducer: 
https://github.com/apache/spark/compare/master...zsxwing:SPARK-20547?expand=1

> ExecutorClassLoader's findClass may not work correctly when a task is 
> cancelled.
> 
>
> Key: SPARK-20547
> URL: https://issues.apache.org/jira/browse/SPARK-20547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> ExecutorClassLoader's findClass may throw some transient exception. For 
> example, when a task is cancelled, if ExecutorClassLoader is running, you may 
> see InterruptedException or IOException, even if this class can be loaded. 
> Then the result of findClass will be cached by JVM, and later when the same 
> class is being loaded (note: in this case, this class may be still loadable), 
> it will just throw NoClassDefFoundError.
> We should make ExecutorClassLoader retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20550) R wrappers for Dataset.alias

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20550:


Assignee: (was: Apache Spark)

> R wrappers for Dataset.alias
> 
>
> Key: SPARK-20550
> URL: https://issues.apache.org/jira/browse/SPARK-20550
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> Add SparkR API for {{o.a.s.sql.Dataset.alias}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20550) R wrappers for Dataset.alias

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20550:


Assignee: Apache Spark

> R wrappers for Dataset.alias
> 
>
> Key: SPARK-20550
> URL: https://issues.apache.org/jira/browse/SPARK-20550
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> Add SparkR API for {{o.a.s.sql.Dataset.alias}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20550) R wrappers for Dataset.alias

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991706#comment-15991706
 ] 

Apache Spark commented on SPARK-20550:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/17825

> R wrappers for Dataset.alias
> 
>
> Key: SPARK-20550
> URL: https://issues.apache.org/jira/browse/SPARK-20550
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> Add SparkR API for {{o.a.s.sql.Dataset.alias}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20421) Mark JobProgressListener (and related classes) as deprecated

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991703#comment-15991703
 ] 

Apache Spark commented on SPARK-20421:
--

User 'vanzin' has created a pull request for this issue:
https://github.com/apache/spark/pull/17824

> Mark JobProgressListener (and related classes) as deprecated
> 
>
> Key: SPARK-20421
> URL: https://issues.apache.org/jira/browse/SPARK-20421
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.2.1
>
>
> This class (and others) were made {{@DeveloperApi}} as part of 
> https://github.com/apache/spark/pull/648. But as part of the work in 
> SPARK-18085, I plan to get rid of a lot of that code, so we should mark these 
> as deprecated in case anyone is still trying to use them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20550) R wrappers for Dataset.alias

2017-05-01 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-20550:
--

 Summary: R wrappers for Dataset.alias
 Key: SPARK-20550
 URL: https://issues.apache.org/jira/browse/SPARK-20550
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 2.2.0
Reporter: Maciej Szymkiewicz
Priority: Minor


Add SparkR API for {{o.a.s.sql.Dataset.alias}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20549) java.io.CharConversionException: Invalid UTF-32 in JsonToStructs

2017-05-01 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-20549:
---

 Summary: java.io.CharConversionException: Invalid UTF-32 in 
JsonToStructs
 Key: SPARK-20549
 URL: https://issues.apache.org/jira/browse/SPARK-20549
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Burak Yavuz


The same fix for SPARK-16548 needs to be applied for JsonToStructs



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20546) spark-class gets syntax error in posix mode

2017-05-01 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991678#comment-15991678
 ] 

Sean Owen commented on SPARK-20546:
---

Are there downsides to turning off posix mode?
It seems reasonable if it makes this case work and doesn't affect other 
behavior.

> spark-class gets syntax error in posix mode
> ---
>
> Key: SPARK-20546
> URL: https://issues.apache.org/jira/browse/SPARK-20546
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.0.2
>Reporter: Jessie Yu
>Priority: Minor
>
> spark-class gets the following error when running in posix mode:
> {code}
> spark-class: line 78: syntax error near unexpected token `<'
> spark-class: line 78: `done < <(build_command "$@")'
> {code}
> \\
> It appears to be complaining about the process substitution: 
> {code}
> CMD=()
> while IFS= read -d '' -r ARG; do
>   CMD+=("$ARG")
> done < <(build_command "$@")
> {code}
> \\
> This can be reproduced by first turning on allexport then posix mode:
> {code}set -a -o posix {code}
> then run something like spark-shell which calls spark-class.
> \\
> The simplest fix is probably to always turn off posix mode in spark-class 
> before the while loop.
> \\
> This was previously reported in 
> [SPARK-8417|https://issues.apache.org/jira/browse/SPARK-8417] which closed 
> with cannot reproduce. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20548:


Assignee: Apache Spark

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>Assignee: Apache Spark
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20548:


Assignee: (was: Apache Spark)

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Sameer Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Agarwal updated SPARK-20548:
---
Description: 
{{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
over the last few days.

https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/

  was:{{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has 
been failing in-deterministically : 
https://spark-tests.appspot.com/failed-tests over the last few days.


> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991633#comment-15991633
 ] 

Apache Spark commented on SPARK-20548:
--

User 'sameeragarwal' has created a pull request for this issue:
https://github.com/apache/spark/pull/17823

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.
> https://spark.test.databricks.com/job/spark-master-test-sbt-hadoop-2.7/176/testReport/junit/org.apache.spark.repl/ReplSuite/newProductSeqEncoder_with_REPL_defined_class/history/



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Sameer Agarwal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991626#comment-15991626
 ] 

Sameer Agarwal commented on SPARK-20548:


I'm going to disable this test in the short term

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20540) Dynamic allocation constantly requests and kills executors

2017-05-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-20540.

   Resolution: Fixed
 Assignee: Ryan Blue
Fix Version/s: 2.2.0
   2.1.2

> Dynamic allocation constantly requests and kills executors
> --
>
> Key: SPARK-20540
> URL: https://issues.apache.org/jira/browse/SPARK-20540
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.0.2, 2.1.0, 2.2.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
> Fix For: 2.1.2, 2.2.0
>
>
> We are seeing some strange behavior with dynamic allocation, where in some 
> cases the driver will get into a state where it constantly kills idle 
> executors while requesting new executors. This happens at the end of a stage 
> when all tasks are assigned and never stops even when there are no tasks to 
> run.
> From the YarnAllocator logs, it looks like the allocator is getting lots of 
> requests from the driver, even though the timeout between requests should be 
> 5s:
> {code:title=Yarn allocator logs}
> 17/04/20 19:52:05 INFO dispatcher-event-loop-49 YarnAllocator: Driver 
> requested a total number of 227 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-30 YarnAllocator: Driver 
> requested a total number of 213 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 1 executor 
> containers, each with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests 
> (locality no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request 
> (host: Any, capability: memory:7168, vCores:2)
> spark://CoarseGrainedScheduler@100.74.39.143:10895,  executorHostname: 
> ip-100-74-34-230.ec2.internal
> spark://CoarseGrainedScheduler@100.74.39.143:10895,  executorHostname: 
> ip-100-74-47-57.ec2.internal
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 2 containers from 
> YARN, launching executors on 2 of them.
> 17/04/20 19:52:05 INFO dispatcher-event-loop-11 YarnAllocator: Driver 
> requested a total number of 195 executor(s).
> 17/04/20 19:52:05 INFO dispatcher-event-loop-55 YarnAllocator: Driver 
> requested a total number of 174 executor(s).
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Will request 2 executor 
> containers, each with 2 cores and 7168 MB memory including 2048 MB overhead
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Canceled 0 container requests 
> (locality no longer needed)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request 
> (host: Any, capability: memory:7168, vCores:2)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Submitted container request 
> (host: Any, capability: memory:7168, vCores:2)
> 17/04/20 19:52:05 INFO Reporter YarnAllocator: Received 4 containers from 
> YARN, launching executors on 4 of them.
> {code}
> I think the allocator cancels what requests it can, but is getting containers 
> that have already been requested and the executors keep growing because of 
> requests from the driver. Here are 5 seconds from the log:
> {code}
> 17/04/20 19:52:30 INFO dispatcher-event-loop-22 YarnAllocator: Driver 
> requested a total number of 185 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-48 YarnAllocator: Driver 
> requested a total number of 193 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-24 YarnAllocator: Driver 
> requested a total number of 192 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-60 YarnAllocator: Driver 
> requested a total number of 195 executor(s).
> 17/04/20 19:52:30 INFO dispatcher-event-loop-53 YarnAllocator: Driver 
> requested a total number of 205 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver 
> requested a total number of 202 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-17 YarnAllocator: Driver 
> requested a total number of 232 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-45 YarnAllocator: Driver 
> requested a total number of 243 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-19 YarnAllocator: Driver 
> requested a total number of 254 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-42 YarnAllocator: Driver 
> requested a total number of 263 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-20 YarnAllocator: Driver 
> requested a total number of 271 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-35 YarnAllocator: Driver 
> requested a total number of 280 executor(s).
> 17/04/20 19:52:31 INFO dispatcher-event-loop-61 YarnAllocator: Driver 
> requested a total number of 289 executor(s).
> 17/04/20 19:52:32 INFO dispatcher-event-loop-22 YarnAllocator: 

[jira] [Updated] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Sameer Agarwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sameer Agarwal updated SPARK-20548:
---
Description: {{newProductSeqEncoder with REPL defined class}} in 
{{ReplSuite}} has been failing in-deterministically : 
https://spark-tests.appspot.com/failed-tests over the last few days.

> Flaky Test:  ReplSuite.newProductSeqEncoder with REPL defined class
> ---
>
> Key: SPARK-20548
> URL: https://issues.apache.org/jira/browse/SPARK-20548
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Sameer Agarwal
>
> {{newProductSeqEncoder with REPL defined class}} in {{ReplSuite}} has been 
> failing in-deterministically : https://spark-tests.appspot.com/failed-tests 
> over the last few days.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20548) Flaky Test: ReplSuite.newProductSeqEncoder with REPL defined class

2017-05-01 Thread Sameer Agarwal (JIRA)
Sameer Agarwal created SPARK-20548:
--

 Summary: Flaky Test:  ReplSuite.newProductSeqEncoder with REPL 
defined class
 Key: SPARK-20548
 URL: https://issues.apache.org/jira/browse/SPARK-20548
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.2.0
Reporter: Sameer Agarwal






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20454) Improvement of ShortestPaths in Spark GraphX

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20454:


Assignee: Apache Spark

> Improvement of ShortestPaths in Spark GraphX
> 
>
> Key: SPARK-20454
> URL: https://issues.apache.org/jira/browse/SPARK-20454
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, MLlib
>Affects Versions: 2.1.0
>Reporter: Ji Dai
>Assignee: Apache Spark
>
> The output of ShortestPaths is not enough. ShortestPaths in Graph/lib is 
> currently in a simple version and can only return the distance to the source 
> vertex. However, the shortest path with intermediate nodes on the path is 
> needed and if two or more paths holds the same shortest distance from source 
> to destination, all these paths need to be returned. In this way, 
> ShortestPaths will be more functional and useful.
> I think I have resolved the concern above with a improved version of 
> ShortestPaths which also based on the "pregel" function in GraphOps.
> Can I get my code reviewed and merged?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20454) Improvement of ShortestPaths in Spark GraphX

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20454:


Assignee: (was: Apache Spark)

> Improvement of ShortestPaths in Spark GraphX
> 
>
> Key: SPARK-20454
> URL: https://issues.apache.org/jira/browse/SPARK-20454
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, MLlib
>Affects Versions: 2.1.0
>Reporter: Ji Dai
>
> The output of ShortestPaths is not enough. ShortestPaths in Graph/lib is 
> currently in a simple version and can only return the distance to the source 
> vertex. However, the shortest path with intermediate nodes on the path is 
> needed and if two or more paths holds the same shortest distance from source 
> to destination, all these paths need to be returned. In this way, 
> ShortestPaths will be more functional and useful.
> I think I have resolved the concern above with a improved version of 
> ShortestPaths which also based on the "pregel" function in GraphOps.
> Can I get my code reviewed and merged?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20454) Improvement of ShortestPaths in Spark GraphX

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991590#comment-15991590
 ] 

Apache Spark commented on SPARK-20454:
--

User 'daijidj' has created a pull request for this issue:
https://github.com/apache/spark/pull/17822

> Improvement of ShortestPaths in Spark GraphX
> 
>
> Key: SPARK-20454
> URL: https://issues.apache.org/jira/browse/SPARK-20454
> Project: Spark
>  Issue Type: Improvement
>  Components: GraphX, MLlib
>Affects Versions: 2.1.0
>Reporter: Ji Dai
>
> The output of ShortestPaths is not enough. ShortestPaths in Graph/lib is 
> currently in a simple version and can only return the distance to the source 
> vertex. However, the shortest path with intermediate nodes on the path is 
> needed and if two or more paths holds the same shortest distance from source 
> to destination, all these paths need to be returned. In this way, 
> ShortestPaths will be more functional and useful.
> I think I have resolved the concern above with a improved version of 
> ShortestPaths which also based on the "pregel" function in GraphOps.
> Can I get my code reviewed and merged?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-20547:
-
Affects Version/s: 2.2.0
 Target Version/s: 2.2.0

> ExecutorClassLoader's findClass may not work correctly when a task is 
> cancelled.
> 
>
> Key: SPARK-20547
> URL: https://issues.apache.org/jira/browse/SPARK-20547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> ExecutorClassLoader's findClass may throw some transient exception. For 
> example, when a task is cancelled, if ExecutorClassLoader is running, you may 
> see InterruptedException or IOException, even if this class can be loaded. 
> Then the result of findClass will be cached by JVM, and later when the same 
> class is being loaded (note: in this case, this class may be still loadable), 
> it will just throw NoClassDefFoundError.
> We should make ExecutorClassLoader retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-20547:
-
Description: 
ExecutorClassLoader's findClass may throw some transient exception. For 
example, when a task is cancelled, if ExecutorClassLoader is running, you may 
see InterruptedException or IOException, even if this class can be loaded. Then 
the result of findClass will be cached by JVM, and later when the same class is 
being loaded (note: in this case, this class may be still loadable), it will 
just throw NoClassDefFoundError.

We should make ExecutorClassLoader retry on transient exceptions.

  was:
ExecutorClassLoader's findClass may throw some transient exception. For 
example, when a task is cancelled, if ExecutorClassLoader is running, you may 
see InterruptedException or IOException, even if this class can be loaded. Then 
the result of findClass will be cached by JVM, and later when the same class is 
being loaded (note: in this case, this class may be still loadable), it will 
just throw NoClassDefFoundError.

We should make ExecutorClassLoader's retry on transient exceptions.


> ExecutorClassLoader's findClass may not work correctly when a task is 
> cancelled.
> 
>
> Key: SPARK-20547
> URL: https://issues.apache.org/jira/browse/SPARK-20547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> ExecutorClassLoader's findClass may throw some transient exception. For 
> example, when a task is cancelled, if ExecutorClassLoader is running, you may 
> see InterruptedException or IOException, even if this class can be loaded. 
> Then the result of findClass will be cached by JVM, and later when the same 
> class is being loaded (note: in this case, this class may be still loadable), 
> it will just throw NoClassDefFoundError.
> We should make ExecutorClassLoader retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-20547:
-
Priority: Blocker  (was: Major)

> ExecutorClassLoader's findClass may not work correctly when a task is 
> cancelled.
> 
>
> Key: SPARK-20547
> URL: https://issues.apache.org/jira/browse/SPARK-20547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Priority: Blocker
>
> ExecutorClassLoader's findClass may throw some transient exception. For 
> example, when a task is cancelled, if ExecutorClassLoader is running, you may 
> see InterruptedException or IOException, even if this class can be loaded. 
> Then the result of findClass will be cached by JVM, and later when the same 
> class is being loaded (note: in this case, this class may be still loadable), 
> it will just throw NoClassDefFoundError.
> We should make ExecutorClassLoader's retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20547) ExecutorClassLoader's findClass may not work correctly when a task is cancelled.

2017-05-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-20547:


 Summary: ExecutorClassLoader's findClass may not work correctly 
when a task is cancelled.
 Key: SPARK-20547
 URL: https://issues.apache.org/jira/browse/SPARK-20547
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Shixiong Zhu


ExecutorClassLoader's findClass may throw some transient exception. For 
example, when a task is cancelled, if ExecutorClassLoader is running, you may 
see InterruptedException or IOException, even if this class can be loaded. Then 
the result of findClass will be cached by JVM, and later when the same class is 
being loaded (note: in this case, this class may be still loadable), it will 
just throw NoClassDefFoundError.

We should make ExecutorClassLoader's retry on transient exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20546) spark-class gets syntax error in posix mode

2017-05-01 Thread Jessie Yu (JIRA)
Jessie Yu created SPARK-20546:
-

 Summary: spark-class gets syntax error in posix mode
 Key: SPARK-20546
 URL: https://issues.apache.org/jira/browse/SPARK-20546
 Project: Spark
  Issue Type: Bug
  Components: Deploy
Affects Versions: 2.0.2
Reporter: Jessie Yu
Priority: Minor


spark-class gets the following error when running in posix mode:
{code}
spark-class: line 78: syntax error near unexpected token `<'
spark-class: line 78: `done < <(build_command "$@")'
{code}
\\
It appears to be complaining about the process substitution: 
{code}
CMD=()
while IFS= read -d '' -r ARG; do
  CMD+=("$ARG")
done < <(build_command "$@")
{code}
\\
This can be reproduced by first turning on allexport then posix mode:
{code}set -a -o posix {code}
then run something like spark-shell which calls spark-class.

\\
The simplest fix is probably to always turn off posix mode in spark-class 
before the while loop.
\\
This was previously reported in 
[SPARK-8417|https://issues.apache.org/jira/browse/SPARK-8417] which closed with 
cannot reproduce. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20464) Add a job group and an informative description for streaming queries

2017-05-01 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-20464.
--
   Resolution: Fixed
 Assignee: Kunal Khamar
Fix Version/s: 2.2.0

> Add a job group and an informative description for streaming queries
> 
>
> Key: SPARK-20464
> URL: https://issues.apache.org/jira/browse/SPARK-20464
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Kunal Khamar
>Assignee: Kunal Khamar
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20529) Worker should not use the received Master address

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20529:


Assignee: (was: Apache Spark)

> Worker should not use the received Master address
> -
>
> Key: SPARK-20529
> URL: https://issues.apache.org/jira/browse/SPARK-20529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>
> Right now when worker connects to master, master will send its address to the 
> worker. Then worker will save this address and use it to reconnect in case of 
> failure.
> However, sometimes, this address is not correct. If there is a proxy between 
> master and worker, the address master sent is not the address of proxy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20529) Worker should not use the received Master address

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20529:


Assignee: Apache Spark

> Worker should not use the received Master address
> -
>
> Key: SPARK-20529
> URL: https://issues.apache.org/jira/browse/SPARK-20529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Apache Spark
>
> Right now when worker connects to master, master will send its address to the 
> worker. Then worker will save this address and use it to reconnect in case of 
> failure.
> However, sometimes, this address is not correct. If there is a proxy between 
> master and worker, the address master sent is not the address of proxy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20529) Worker should not use the received Master address

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991197#comment-15991197
 ] 

Apache Spark commented on SPARK-20529:
--

User 'zsxwing' has created a pull request for this issue:
https://github.com/apache/spark/pull/17821

> Worker should not use the received Master address
> -
>
> Key: SPARK-20529
> URL: https://issues.apache.org/jira/browse/SPARK-20529
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Shixiong Zhu
>
> Right now when worker connects to master, master will send its address to the 
> worker. Then worker will save this address and use it to reconnect in case of 
> failure.
> However, sometimes, this address is not correct. If there is a proxy between 
> master and worker, the address master sent is not the address of proxy.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20517) Download link in history server UI is not correct

2017-05-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-20517.

   Resolution: Fixed
 Assignee: Saisai Shao
Fix Version/s: 2.2.1
   2.1.2

> Download link in history server UI is not correct
> -
>
> Key: SPARK-20517
> URL: https://issues.apache.org/jira/browse/SPARK-20517
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.1.2, 2.2.1
>
>
> The download link in history server UI is concatenated with:
> {code}
>class="btn btn-info btn-mini">Download
> {code}
> Here {{num}} filed represents number of attempts, this is not equal to REST 
> APIs. In the REST API, if attempt id is not existed, then {{num}} field 
> should be empty, otherwise this {{num}} field should actually be 
> {{attemptId}}.
> This will lead to the issue of "no such app", rather than correctly download 
> the event log.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20429) [GRAPHX] Strange results for personalized pagerank if node is involved in a cycle

2017-05-01 Thread Andrew Ray (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991099#comment-15991099
 ] 

Andrew Ray commented on SPARK-20429:


Can you retest your example with Spark 2.2/master. SPARK-18847 probably fixed 
your issue.

> [GRAPHX] Strange results for personalized pagerank if node is involved in a 
> cycle
> -
>
> Key: SPARK-20429
> URL: https://issues.apache.org/jira/browse/SPARK-20429
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX
>Affects Versions: 2.1.0
>Reporter: Francesco Elia
>  Labels: graphx
>
> I'm trying to run the personalized PageRank implementation of GraphX on a 
> simple test graph, which is the following: 
> Image: https://i.stack.imgur.com/JDv1l.jpg
> I'm a bit confused on some results that I get when I try to compute the PPR 
> for a node that is involved in a cycle. For example, the final output for the 
> node 12 is as follows:
> (13, 0.0141)
> (7,  0.0141)
> (19, 0.0153)
> (17, 0.0153)
> (20, 0.0153)
> (11, 0.0391)
> (14, 0.0460)
> (15, 0.0541)
> (16, 0.0541)
> (12, 0.1832)
> I would clearly expect that the node 13 would have a much higher PPR value 
> (in fact, I would expect it to be the first one after the starting node 
> itself). The problem appears as well with other nodes involved in cycles, for 
> example for starting node 13 the node 15 has a very low score. From all the 
> testing that I have done it seems that for starting nodes that do not 
> participate in a cycle the result is exactly how I expect.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20534) Outer generators skip missing records if used alone

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20534.
-
   Resolution: Fixed
 Assignee: Maciej Szymkiewicz
Fix Version/s: 2.3.0
   2.2.1

> Outer generators skip missing records if used alone
> ---
>
> Key: SPARK-20534
> URL: https://issues.apache.org/jira/browse/SPARK-20534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: master 814a61a867ded965433c944c90961df529ac83ab
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
> Fix For: 2.2.1, 2.3.0
>
>
> Example data:
> {code}
> val df = Seq(
>   (1, Some("a" :: "b" :: "c" :: Nil)), 
>   (2, None), 
>   (3, Some("a" :: Nil)
> )).toDF("k", "vs")
> {code}
> Correct behavior if there are other expressions:
> {code}
> df.select($"k", explode_outer($"vs")).show
> // +---++
> // |  k| col|
> // +---++
> // |  1|   a|
> // |  1|   b|
> // |  1|   c|
> // |  2|null|
> // |  3|   a|
> // +---++
> df.select($"k", posexplode_outer($"vs")).show
> // +---+++
> // |  k| pos| col|
> // +---+++
> // |  1|   0|   a|
> // |  1|   1|   b|
> // |  1|   2|   c|
> // |  2|null|null|
> // |  3|   0|   a|
> // +---+++
> {code}
> Incorrect behavior if used alone:
> {code}
> df.select(explode_outer($"vs")).show
> // +---+
> // |col|
> // +---+
> // |  a|
> // |  b|
> // |  c|
> // |  a|
> // +---+
> df.select(posexplode_outer($"vs")).show
> // +---+---+
> // |pos|col|
> // +---+---+
> // |  0|  a|
> // |  1|  b|
> // |  2|  c|
> // |  0|  a|
> // +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20534) Outer generators skip missing records if used alone

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20534:

Labels:   (was: correctness)

> Outer generators skip missing records if used alone
> ---
>
> Key: SPARK-20534
> URL: https://issues.apache.org/jira/browse/SPARK-20534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
> Environment: master 814a61a867ded965433c944c90961df529ac83ab
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
> Fix For: 2.2.1, 2.3.0
>
>
> Example data:
> {code}
> val df = Seq(
>   (1, Some("a" :: "b" :: "c" :: Nil)), 
>   (2, None), 
>   (3, Some("a" :: Nil)
> )).toDF("k", "vs")
> {code}
> Correct behavior if there are other expressions:
> {code}
> df.select($"k", explode_outer($"vs")).show
> // +---++
> // |  k| col|
> // +---++
> // |  1|   a|
> // |  1|   b|
> // |  1|   c|
> // |  2|null|
> // |  3|   a|
> // +---++
> df.select($"k", posexplode_outer($"vs")).show
> // +---+++
> // |  k| pos| col|
> // +---+++
> // |  1|   0|   a|
> // |  1|   1|   b|
> // |  1|   2|   c|
> // |  2|null|null|
> // |  3|   0|   a|
> // +---+++
> {code}
> Incorrect behavior if used alone:
> {code}
> df.select(explode_outer($"vs")).show
> // +---+
> // |col|
> // +---+
> // |  a|
> // |  b|
> // |  c|
> // |  a|
> // +---+
> df.select(posexplode_outer($"vs")).show
> // +---+---+
> // |pos|col|
> // +---+---+
> // |  0|  a|
> // |  1|  b|
> // |  2|  c|
> // |  0|  a|
> // +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20290) PySpark Column should provide eqNullSafe

2017-05-01 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20290.
-
   Resolution: Fixed
 Assignee: Maciej Szymkiewicz
Fix Version/s: 2.3.0

> PySpark Column should provide eqNullSafe
> 
>
> Key: SPARK-20290
> URL: https://issues.apache.org/jira/browse/SPARK-20290
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 2.3.0
>
>
> NULL safe equality operators are missing from PySpark and {{(a == b) | 
> (a.isNull & b.isNull)}} creates a suboptimal execution plan.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19697) NoSuchMethodError: org.apache.avro.Schema.getLogicalType()

2017-05-01 Thread Michael Heuer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15991022#comment-15991022
 ] 

Michael Heuer commented on SPARK-19697:
---

dongjoon: Wait a minute, if I understand correctly, the Spark unit tests won't 
pass against avro 1.7.7?  What if someone runs the code in the unit tests on 
Spark at runtime?

> NoSuchMethodError: org.apache.avro.Schema.getLogicalType()
> --
>
> Key: SPARK-19697
> URL: https://issues.apache.org/jira/browse/SPARK-19697
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Core
>Affects Versions: 2.1.0
> Environment: Apache Spark 2.1.0, Scala version 2.11.8, Java 
> HotSpot(TM) 64-Bit Server VM, 1.8.0_60
>Reporter: Michael Heuer
>
> In a downstream project (https://github.com/bigdatagenomics/adam), adding a 
> dependency on parquet-avro version 1.8.2 results in NoSuchMethodExceptions at 
> runtime on various Spark versions, including 2.1.0.
> pom.xml:
> {code:xml}
>   
> 1.8
> 1.8.1
> 2.11.8
> 2.11
> 2.1.0
> 1.8.2
> 
>   
> 
>   
> org.apache.parquet
> parquet-avro
> ${parquet.version}
>   
> {code}
> Example using spark-submit (called via adam-submit below):
> {code}
> $ ./bin/adam-submit vcf2adam \
>   adam-core/src/test/resources/small.vcf \
>   small.adam
> ...
> java.lang.NoSuchMethodError: 
> org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
>   at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:283)
>   at 
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue can be reproduced from this pull request
> https://github.com/bigdatagenomics/adam/pull/1360
> and is reported as Jenkins CI test failures, e.g.
> https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1810
> d...@spark.apache.org mailing list archive thread
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-VOTE-Release-Apache-Parquet-1-8-2-RC1-tp20711p20720.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20501) ML, Graph 2.2 QA: API: New Scala APIs, docs

2017-05-01 Thread Yanbo Liang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990978#comment-15990978
 ] 

Yanbo Liang commented on SPARK-20501:
-

I'm starting this work. Thanks.

> ML, Graph 2.2 QA: API: New Scala APIs, docs
> ---
>
> Key: SPARK-20501
> URL: https://issues.apache.org/jira/browse/SPARK-20501
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Audit new public Scala APIs added to MLlib & GraphX.  Take note of:
> * Protected/public classes or methods.  If access can be more private, then 
> it should be.
> * Also look for non-sealed traits.
> * Documentation: Missing?  Bad links or formatting?
> *Make sure to check the object doc!*
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20501) ML, Graph 2.2 QA: API: New Scala APIs, docs

2017-05-01 Thread Yanbo Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanbo Liang reassigned SPARK-20501:
---

Assignee: Yanbo Liang

> ML, Graph 2.2 QA: API: New Scala APIs, docs
> ---
>
> Key: SPARK-20501
> URL: https://issues.apache.org/jira/browse/SPARK-20501
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
>
> Audit new public Scala APIs added to MLlib & GraphX.  Take note of:
> * Protected/public classes or methods.  If access can be more private, then 
> it should be.
> * Also look for non-sealed traits.
> * Documentation: Missing?  Bad links or formatting?
> *Make sure to check the object doc!*
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19697) NoSuchMethodError: org.apache.avro.Schema.getLogicalType()

2017-05-01 Thread Michael Heuer (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990972#comment-15990972
 ] 

Michael Heuer commented on SPARK-19697:
---

We currently declare our compile time dependency to avro 1.8.1

https://github.com/bigdatagenomics/adam/blob/master/pom.xml#L24
https://github.com/bigdatagenomics/bdg-formats/blob/master/pom.xml#L56

and hadn't hit any runtime issues due to avro 1.8.1 vs. 1.7.7 at runtime until 
bumping the parquet dependency to 1.8.2.

> NoSuchMethodError: org.apache.avro.Schema.getLogicalType()
> --
>
> Key: SPARK-19697
> URL: https://issues.apache.org/jira/browse/SPARK-19697
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Core
>Affects Versions: 2.1.0
> Environment: Apache Spark 2.1.0, Scala version 2.11.8, Java 
> HotSpot(TM) 64-Bit Server VM, 1.8.0_60
>Reporter: Michael Heuer
>
> In a downstream project (https://github.com/bigdatagenomics/adam), adding a 
> dependency on parquet-avro version 1.8.2 results in NoSuchMethodExceptions at 
> runtime on various Spark versions, including 2.1.0.
> pom.xml:
> {code:xml}
>   
> 1.8
> 1.8.1
> 2.11.8
> 2.11
> 2.1.0
> 1.8.2
> 
>   
> 
>   
> org.apache.parquet
> parquet-avro
> ${parquet.version}
>   
> {code}
> Example using spark-submit (called via adam-submit below):
> {code}
> $ ./bin/adam-submit vcf2adam \
>   adam-core/src/test/resources/small.vcf \
>   small.adam
> ...
> java.lang.NoSuchMethodError: 
> org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
>   at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:283)
>   at 
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue can be reproduced from this pull request
> https://github.com/bigdatagenomics/adam/pull/1360
> and is reported as Jenkins CI test failures, e.g.
> https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1810
> d...@spark.apache.org mailing list archive thread
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-VOTE-Release-Apache-Parquet-1-8-2-RC1-tp20711p20720.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19697) NoSuchMethodError: org.apache.avro.Schema.getLogicalType()

2017-05-01 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990957#comment-15990957
 ] 

Dongjoon Hyun commented on SPARK-19697:
---

[~heuermh], Could you try to specify your avro dependency in your project like 
the following?

https://github.com/apache/spark/blame/master/sql/core/pom.xml#L136-L148

> NoSuchMethodError: org.apache.avro.Schema.getLogicalType()
> --
>
> Key: SPARK-19697
> URL: https://issues.apache.org/jira/browse/SPARK-19697
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Core
>Affects Versions: 2.1.0
> Environment: Apache Spark 2.1.0, Scala version 2.11.8, Java 
> HotSpot(TM) 64-Bit Server VM, 1.8.0_60
>Reporter: Michael Heuer
>
> In a downstream project (https://github.com/bigdatagenomics/adam), adding a 
> dependency on parquet-avro version 1.8.2 results in NoSuchMethodExceptions at 
> runtime on various Spark versions, including 2.1.0.
> pom.xml:
> {code:xml}
>   
> 1.8
> 1.8.1
> 2.11.8
> 2.11
> 2.1.0
> 1.8.2
> 
>   
> 
>   
> org.apache.parquet
> parquet-avro
> ${parquet.version}
>   
> {code}
> Example using spark-submit (called via adam-submit below):
> {code}
> $ ./bin/adam-submit vcf2adam \
>   adam-core/src/test/resources/small.vcf \
>   small.adam
> ...
> java.lang.NoSuchMethodError: 
> org.apache.avro.Schema.getLogicalType()Lorg/apache/avro/LogicalType;
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:178)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:152)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertUnion(AvroSchemaConverter.java:214)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:171)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:130)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:227)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:124)
>   at 
> org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:115)
>   at org.apache.parquet.avro.AvroWriteSupport.init(AvroWriteSupport.java:117)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:311)
>   at 
> org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:283)
>   at 
> org.apache.spark.rdd.InstrumentedOutputFormat.getRecordWriter(InstrumentedOutputFormat.scala:35)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1102)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:99)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The issue can be reproduced from this pull request
> https://github.com/bigdatagenomics/adam/pull/1360
> and is reported as Jenkins CI test failures, e.g.
> https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1810
> d...@spark.apache.org mailing list archive thread
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-VOTE-Release-Apache-Parquet-1-8-2-RC1-tp20711p20720.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9686) Spark Thrift server doesn't return correct JDBC metadata

2017-05-01 Thread N Campbell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990849#comment-15990849
 ] 

N Campbell commented on SPARK-9686:
---

Is this likely to be fixed? 

current forces companies to purchase commercial JDBC drivers as a work around.


> Spark Thrift server doesn't return correct JDBC metadata 
> -
>
> Key: SPARK-9686
> URL: https://issues.apache.org/jira/browse/SPARK-9686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.0, 1.4.1, 1.5.0, 1.5.1, 1.5.2
>Reporter: pin_zhang
>Assignee: Cheng Lian
>Priority: Critical
> Attachments: SPARK-9686.1.patch.txt
>
>
> 1. Start  start-thriftserver.sh
> 2. connect with beeline
> 3. create table
> 4.show tables, the new created table returned
> 5.
>   Class.forName("org.apache.hive.jdbc.HiveDriver");
>   String URL = "jdbc:hive2://localhost:1/default";
>Properties info = new Properties();
> Connection conn = DriverManager.getConnection(URL, info);
>   ResultSet tables = conn.getMetaData().getTables(conn.getCatalog(),
>null, null, null);
> Problem:
>No tables with returned this API, that work in spark1.3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20542) Add an API into Bucketizer that can bin a lot of columns all at once

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20542:


Assignee: (was: Apache Spark)

> Add an API into Bucketizer that can bin a lot of columns all at once
> 
>
> Key: SPARK-20542
> URL: https://issues.apache.org/jira/browse/SPARK-20542
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> Current ML's Bucketizer can only bin a column of continuous features. If a 
> dataset has thousands of of continuous columns needed to bin, we will result 
> in thousands of ML stages. It is very inefficient regarding query planning 
> and execution.
> We should have a type of bucketizer that can bin a lot of columns all at 
> once. It would need to accept an list of arrays of split points to correspond 
> to the columns to bin, but it might make things more efficient by replacing 
> thousands of stages with just one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20542) Add an API into Bucketizer that can bin a lot of columns all at once

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20542:


Assignee: Apache Spark

> Add an API into Bucketizer that can bin a lot of columns all at once
> 
>
> Key: SPARK-20542
> URL: https://issues.apache.org/jira/browse/SPARK-20542
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> Current ML's Bucketizer can only bin a column of continuous features. If a 
> dataset has thousands of of continuous columns needed to bin, we will result 
> in thousands of ML stages. It is very inefficient regarding query planning 
> and execution.
> We should have a type of bucketizer that can bin a lot of columns all at 
> once. It would need to accept an list of arrays of split points to correspond 
> to the columns to bin, but it might make things more efficient by replacing 
> thousands of stages with just one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20542) Add an API into Bucketizer that can bin a lot of columns all at once

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990840#comment-15990840
 ] 

Apache Spark commented on SPARK-20542:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/17819

> Add an API into Bucketizer that can bin a lot of columns all at once
> 
>
> Key: SPARK-20542
> URL: https://issues.apache.org/jira/browse/SPARK-20542
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.2.0
>Reporter: Liang-Chi Hsieh
>
> Current ML's Bucketizer can only bin a column of continuous features. If a 
> dataset has thousands of of continuous columns needed to bin, we will result 
> in thousands of ML stages. It is very inefficient regarding query planning 
> and execution.
> We should have a type of bucketizer that can bin a lot of columns all at 
> once. It would need to accept an list of arrays of split points to correspond 
> to the columns to bin, but it might make things more efficient by replacing 
> thousands of stages with just one.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20545) union set operator should default to DISTINCT and not ALL semantics

2017-05-01 Thread N Campbell (JIRA)
N Campbell created SPARK-20545:
--

 Summary: union set operator should default to DISTINCT and not ALL 
semantics
 Key: SPARK-20545
 URL: https://issues.apache.org/jira/browse/SPARK-20545
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0
Reporter: N Campbell


A set operation (i.e union) over two queries that produce identical row values 
should return the distinct set of rows and not all rows.

ISO-SQL set operation semantics default to DISTINCT 
SPARK implementation is defaulting to ALL
While SPARK allows DISTINCT keyword and some might assume ALL is faster, the 
wrong result set semantically is produced per standard (and commercial SQL 
systems including: ORACLE, DB2, Teradata, SQL Server etc.)

select tsint.csint from cert.tsint 
union 
select tint.cint from cert.tint 

csint

-1
0
1
10

-1
0
1
10


vs

select tsint.csint from cert.tsint union distinct select tint.cint from 
cert.tint 

csint
-1

1
10
0




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20544) R wrapper for input_file_name

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20544:


Assignee: (was: Apache Spark)

> R wrapper for input_file_name
> -
>
> Key: SPARK-20544
> URL: https://issues.apache.org/jira/browse/SPARK-20544
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> SparkR wrapper for {{o.a.s.sql.functions.input_file_name}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20544) R wrapper for input_file_name

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20544:


Assignee: Apache Spark

> R wrapper for input_file_name
> -
>
> Key: SPARK-20544
> URL: https://issues.apache.org/jira/browse/SPARK-20544
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Minor
>
> SparkR wrapper for {{o.a.s.sql.functions.input_file_name}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20544) R wrapper for input_file_name

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990635#comment-15990635
 ] 

Apache Spark commented on SPARK-20544:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/17818

> R wrapper for input_file_name
> -
>
> Key: SPARK-20544
> URL: https://issues.apache.org/jira/browse/SPARK-20544
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Maciej Szymkiewicz
>Priority: Minor
>
> SparkR wrapper for {{o.a.s.sql.functions.input_file_name}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20544) R wrapper for input_file_name

2017-05-01 Thread Maciej Szymkiewicz (JIRA)
Maciej Szymkiewicz created SPARK-20544:
--

 Summary: R wrapper for input_file_name
 Key: SPARK-20544
 URL: https://issues.apache.org/jira/browse/SPARK-20544
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Affects Versions: 2.2.0
Reporter: Maciej Szymkiewicz
Priority: Minor


SparkR wrapper for {{o.a.s.sql.functions.input_file_name}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-20541) SparkR SS should support awaitTermination without timeout

2017-05-01 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-20541.
--
  Resolution: Fixed
Assignee: Felix Cheung
   Fix Version/s: 2.3.0
  2.2.0
Target Version/s: 2.2.0, 2.3.0

> SparkR SS should support awaitTermination without timeout
> -
>
> Key: SPARK-20541
> URL: https://issues.apache.org/jira/browse/SPARK-20541
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, Structured Streaming
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
> Fix For: 2.2.0, 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19256) Hive bucketing support

2017-05-01 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990618#comment-15990618
 ] 

Wenchen Fan commented on SPARK-19256:
-

Yea I agree that we should use {{requiredDistribution}} and 
{{requiredOrdering}} to group and sort the input data instead of doing it 
manually in `FileFormatWriter`, but at the time when we were refactoring 
{{InsertIntoHiveTable}}, it was better than before, because previously we group 
and sort the input data in {{XXXWriteContainer}}, and the code is very similar 
to {{FileFormatWriter}}.

> Hive bucketing support
> --
>
> Key: SPARK-19256
> URL: https://issues.apache.org/jira/browse/SPARK-19256
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Tejas Patil
>Priority: Minor
>
> JIRA to track design discussions and tasks related to Hive bucketing support 
> in Spark.
> Proposal : 
> https://docs.google.com/document/d/1a8IDh23RAkrkg9YYAeO51F4aGO8-xAlupKwdshve2fc/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20543) R should skip long running or non-essential tests when running on CRAN

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20543:


Assignee: Felix Cheung  (was: Apache Spark)

> R should skip long running or non-essential tests when running on CRAN
> --
>
> Key: SPARK-20543
> URL: https://issues.apache.org/jira/browse/SPARK-20543
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> This is actually recommended in the CRAN policies



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20543) R should skip long running or non-essential tests when running on CRAN

2017-05-01 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20543:


Assignee: Apache Spark  (was: Felix Cheung)

> R should skip long running or non-essential tests when running on CRAN
> --
>
> Key: SPARK-20543
> URL: https://issues.apache.org/jira/browse/SPARK-20543
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>
> This is actually recommended in the CRAN policies



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20543) R should skip long running or non-essential tests when running on CRAN

2017-05-01 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15990598#comment-15990598
 ] 

Apache Spark commented on SPARK-20543:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/17817

> R should skip long running or non-essential tests when running on CRAN
> --
>
> Key: SPARK-20543
> URL: https://issues.apache.org/jira/browse/SPARK-20543
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> This is actually recommended in the CRAN policies



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20543) R should skip long running or non-essential tests when running on CRAN

2017-05-01 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-20543:
-
Description: This is actually recommended in the CRAN policies

> R should skip long running or non-essential tests when running on CRAN
> --
>
> Key: SPARK-20543
> URL: https://issues.apache.org/jira/browse/SPARK-20543
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> This is actually recommended in the CRAN policies



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20543) R should skip long running or non-essential tests when running on CRAN

2017-05-01 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-20543:


Assignee: Felix Cheung

> R should skip long running or non-essential tests when running on CRAN
> --
>
> Key: SPARK-20543
> URL: https://issues.apache.org/jira/browse/SPARK-20543
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20543) R should skip long running or non-essential tests when running on CRAN

2017-05-01 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-20543:


 Summary: R should skip long running or non-essential tests when 
running on CRAN
 Key: SPARK-20543
 URL: https://issues.apache.org/jira/browse/SPARK-20543
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.2.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org