[jira] [Resolved] (SPARK-14595) Add inputMetrics to FileScanRDD
[ https://issues.apache.org/jira/browse/SPARK-14595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14595. - Resolution: Fixed Fix Version/s: 2.0.0 > Add inputMetrics to FileScanRDD > --- > > Key: SPARK-14595 > URL: https://issues.apache.org/jira/browse/SPARK-14595 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12457) Add ExpressionDescription to collection functions
[ https://issues.apache.org/jira/browse/SPARK-12457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247221#comment-15247221 ] Apache Spark commented on SPARK-12457: -- User 'gatorsmile' has created a pull request for this issue: https://github.com/apache/spark/pull/12492 > Add ExpressionDescription to collection functions > - > > Key: SPARK-12457 > URL: https://issues.apache.org/jira/browse/SPARK-12457 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14126) [Table related commands] Truncate table
[ https://issues.apache.org/jira/browse/SPARK-14126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247161#comment-15247161 ] Adrian Wang commented on SPARK-14126: - Yes, still working. > [Table related commands] Truncate table > --- > > Key: SPARK-14126 > URL: https://issues.apache.org/jira/browse/SPARK-14126 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai > > TOK_TRUNCATETABLE > We also need to check the behavior of Hive when we call truncate table on a > partitioned table. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model
[ https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14712: Assignee: Apache Spark > spark.ml LogisticRegressionModel.toString should summarize model > > > Key: SPARK-14712 > URL: https://issues.apache.org/jira/browse/SPARK-14712 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Trivial > Labels: starter > > spark.mllib LogisticRegressionModel overrides toString to print a little > model info. We should do the same in spark.ml. I'd recommend: > * super.toString > * numClasses > * numFeatures > We should also override {{__repr__}} in pyspark to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model
[ https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247159#comment-15247159 ] Apache Spark commented on SPARK-14712: -- User 'hujy' has created a pull request for this issue: https://github.com/apache/spark/pull/12491 > spark.ml LogisticRegressionModel.toString should summarize model > > > Key: SPARK-14712 > URL: https://issues.apache.org/jira/browse/SPARK-14712 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > spark.mllib LogisticRegressionModel overrides toString to print a little > model info. We should do the same in spark.ml. I'd recommend: > * super.toString > * numClasses > * numFeatures > We should also override {{__repr__}} in pyspark to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model
[ https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14712: Assignee: (was: Apache Spark) > spark.ml LogisticRegressionModel.toString should summarize model > > > Key: SPARK-14712 > URL: https://issues.apache.org/jira/browse/SPARK-14712 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > spark.mllib LogisticRegressionModel overrides toString to print a little > model info. We should do the same in spark.ml. I'd recommend: > * super.toString > * numClasses > * numFeatures > We should also override {{__repr__}} in pyspark to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model
[ https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247141#comment-15247141 ] hujiayin edited comment on SPARK-14712 at 4/19/16 3:59 AM: --- Hi Gayathri, I think self has the numFeatures and numClasses defined and I can submit a code for this issue. was (Author: hujiayin): Hi Murali, I think self has the numFeatures and numClasses defined and I can submit a code for this issue. > spark.ml LogisticRegressionModel.toString should summarize model > > > Key: SPARK-14712 > URL: https://issues.apache.org/jira/browse/SPARK-14712 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > spark.mllib LogisticRegressionModel overrides toString to print a little > model info. We should do the same in spark.ml. I'd recommend: > * super.toString > * numClasses > * numFeatures > We should also override {{__repr__}} in pyspark to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model
[ https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247141#comment-15247141 ] hujiayin commented on SPARK-14712: -- Hi Murali, I think self has the numFeatures and numClasses defined and I can submit a code for this issue. > spark.ml LogisticRegressionModel.toString should summarize model > > > Key: SPARK-14712 > URL: https://issues.apache.org/jira/browse/SPARK-14712 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > spark.mllib LogisticRegressionModel overrides toString to print a little > model info. We should do the same in spark.ml. I'd recommend: > * super.toString > * numClasses > * numFeatures > We should also override {{__repr__}} in pyspark to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14687) Call path.getFileSystem(conf) instead of call FileSystem.get(conf)
[ https://issues.apache.org/jira/browse/SPARK-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247128#comment-15247128 ] Liwei Lin commented on SPARK-14687: --- Updated with problem details. Thanks for the reminder! :-) > Call path.getFileSystem(conf) instead of call FileSystem.get(conf) > -- > > Key: SPARK-14687 > URL: https://issues.apache.org/jira/browse/SPARK-14687 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Liwei Lin >Priority: Minor > > Generally we should call path.getFileSystem(conf) instead of call > FileSystem.get(conf), because the latter is actually called on the > DEFAULT_URI (fs.defaultFS), leading to problems under certain situations: > - if {{fs.defaultFS}} is {{hdfs://clusterA/...}}, but path is > {{hdfs://clusterB/...}}: then we'll encounter > {{java.lang.IllegalArgumentException (Wrong FS: hdfs://clusterB/..., > expected: hdfs://clusterA/...)}} > - if {{fs.defaultFS}} is not specified, the schema will default to > {{file:///}}: then we'll encounter {{java.lang.IllegalArgumentException > (Wrong FS: hdfs://..., expected: file:///)}} > - if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which > is used for federated HDFS): then we'll encounter > {{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: > viewfs:///)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14687) Call path.getFileSystem(conf) instead of call FileSystem.get(conf)
[ https://issues.apache.org/jira/browse/SPARK-14687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liwei Lin updated SPARK-14687: -- Description: Generally we should call path.getFileSystem(conf) instead of call FileSystem.get(conf), because the latter is actually called on the DEFAULT_URI (fs.defaultFS), leading to problems under certain situations: - if {{fs.defaultFS}} is {{hdfs://clusterA/...}}, but path is {{hdfs://clusterB/...}}: then we'll encounter {{java.lang.IllegalArgumentException (Wrong FS: hdfs://clusterB/..., expected: hdfs://clusterA/...)}} - if {{fs.defaultFS}} is not specified, the schema will default to {{file:///}}: then we'll encounter {{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: file:///)}} - if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which is used for federated HDFS): then we'll encounter {{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: viewfs:///)}} was: Generally we should call path.getFileSystem(conf) instead of call FileSystem.get(conf), because the latter is actually called on the DEFAULT_URI (fs.defaultFS), leading to problems under certain situations: - if {{fs.defaultFS}} is not specified, the schema will default to {{file:///}} - if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which is used for federated HDFS) - if {{fs.defaultFS}} is {{hdfs://A/...}}, but path is {{hdfs://B/...}} > Call path.getFileSystem(conf) instead of call FileSystem.get(conf) > -- > > Key: SPARK-14687 > URL: https://issues.apache.org/jira/browse/SPARK-14687 > Project: Spark > Issue Type: Improvement > Components: MLlib, Spark Core, SQL >Affects Versions: 2.0.0 >Reporter: Liwei Lin >Priority: Minor > > Generally we should call path.getFileSystem(conf) instead of call > FileSystem.get(conf), because the latter is actually called on the > DEFAULT_URI (fs.defaultFS), leading to problems under certain situations: > - if {{fs.defaultFS}} is {{hdfs://clusterA/...}}, but path is > {{hdfs://clusterB/...}}: then we'll encounter > {{java.lang.IllegalArgumentException (Wrong FS: hdfs://clusterB/..., > expected: hdfs://clusterA/...)}} > - if {{fs.defaultFS}} is not specified, the schema will default to > {{file:///}}: then we'll encounter {{java.lang.IllegalArgumentException > (Wrong FS: hdfs://..., expected: file:///)}} > - if {{fs.defaultFS}} is not {{hdfs://...}}, for example {{viewfs://}}(which > is used for federated HDFS): then we'll encounter > {{java.lang.IllegalArgumentException (Wrong FS: hdfs://..., expected: > viewfs:///)}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14724) Improve performance of sorting by using radix sort when possible
[ https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14724: Assignee: (was: Apache Spark) > Improve performance of sorting by using radix sort when possible > > > Key: SPARK-14724 > URL: https://issues.apache.org/jira/browse/SPARK-14724 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Eric Liang > > Spark currently uses TimSort for all in-memory sorts, including sorts done > for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. > sorting by integer keys). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14724) Improve performance of sorting by using radix sort when possible
[ https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247111#comment-15247111 ] Apache Spark commented on SPARK-14724: -- User 'ericl' has created a pull request for this issue: https://github.com/apache/spark/pull/12490 > Improve performance of sorting by using radix sort when possible > > > Key: SPARK-14724 > URL: https://issues.apache.org/jira/browse/SPARK-14724 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Eric Liang > > Spark currently uses TimSort for all in-memory sorts, including sorts done > for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. > sorting by integer keys). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14724) Improve performance of sorting by using radix sort when possible
[ https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14724: Assignee: Apache Spark > Improve performance of sorting by using radix sort when possible > > > Key: SPARK-14724 > URL: https://issues.apache.org/jira/browse/SPARK-14724 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Eric Liang >Assignee: Apache Spark > > Spark currently uses TimSort for all in-memory sorts, including sorts done > for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. > sorting by integer keys). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13904) Add support for pluggable cluster manager
[ https://issues.apache.org/jira/browse/SPARK-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13904. - Resolution: Fixed Assignee: Hemant Bhanawat Fix Version/s: 2.0.0 > Add support for pluggable cluster manager > - > > Key: SPARK-13904 > URL: https://issues.apache.org/jira/browse/SPARK-13904 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Reporter: Hemant Bhanawat >Assignee: Hemant Bhanawat > Fix For: 2.0.0 > > > Currently Spark allows only a few cluster managers viz Yarn, Mesos and > Standalone. But, as Spark is now being used in newer and different use cases, > there is a need for allowing other cluster managers to manage spark > components. One such use case is - embedding spark components like executor > and driver inside another process which may be a datastore. This allows > colocation of data and processing. Another requirement that stems from such a > use case is that the executors/driver should not take the parent process down > when they go down and the components can be relaunched inside the same > process again. > So, this JIRA requests two functionalities: > 1. Support for external cluster managers > 2. Allow a cluster manager to clean up the tasks without taking the parent > process down. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13904) Add support for pluggable cluster manager
[ https://issues.apache.org/jira/browse/SPARK-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247096#comment-15247096 ] Hemant Bhanawat commented on SPARK-13904: - [~kiszk] Since the builds are passing now, can I assume that it was some sporadic issue and close this JIRA? > Add support for pluggable cluster manager > - > > Key: SPARK-13904 > URL: https://issues.apache.org/jira/browse/SPARK-13904 > Project: Spark > Issue Type: Improvement > Components: Scheduler >Reporter: Hemant Bhanawat > > Currently Spark allows only a few cluster managers viz Yarn, Mesos and > Standalone. But, as Spark is now being used in newer and different use cases, > there is a need for allowing other cluster managers to manage spark > components. One such use case is - embedding spark components like executor > and driver inside another process which may be a datastore. This allows > colocation of data and processing. Another requirement that stems from such a > use case is that the executors/driver should not take the parent process down > when they go down and the components can be relaunched inside the same > process again. > So, this JIRA requests two functionalities: > 1. Support for external cluster managers > 2. Allow a cluster manager to clean up the tasks without taking the parent > process down. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14724) Improve performance of sorting by using radix sort when possible
Eric Liang created SPARK-14724: -- Summary: Improve performance of sorting by using radix sort when possible Key: SPARK-14724 URL: https://issues.apache.org/jira/browse/SPARK-14724 Project: Spark Issue Type: Improvement Reporter: Eric Liang Spark currently uses TimSort for all in-memory sorts, including sorts done for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. sorting by integer keys). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14724) Improve performance of sorting by using radix sort when possible
[ https://issues.apache.org/jira/browse/SPARK-14724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Liang updated SPARK-14724: --- Component/s: Spark Core > Improve performance of sorting by using radix sort when possible > > > Key: SPARK-14724 > URL: https://issues.apache.org/jira/browse/SPARK-14724 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Eric Liang > > Spark currently uses TimSort for all in-memory sorts, including sorts done > for shuffle. One low-hanging fruit is to use radix sort when possible (e.g. > sorting by integer keys). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen
[ https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14722. - Resolution: Fixed Assignee: Sameer Agarwal Fix Version/s: 2.0.0 > Rename upstreams() -> inputRDDs() in WholeStageCodegen > -- > > Key: SPARK-14722 > URL: https://issues.apache.org/jira/browse/SPARK-14722 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14718) Avoid mutating ExprCode in doGenCode
[ https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14718. - Resolution: Fixed Assignee: Sameer Agarwal Fix Version/s: 2.0.0 > Avoid mutating ExprCode in doGenCode > > > Key: SPARK-14718 > URL: https://issues.apache.org/jira/browse/SPARK-14718 > Project: Spark > Issue Type: Improvement >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal > Fix For: 2.0.0 > > > The `doGenCode` method currently takes in an ExprCode, mutates it and returns > the java code to evaluate the given expression. It should instead just return > a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14709) spark.ml API for linear SVM
[ https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246882#comment-15246882 ] yuhao yang edited comment on SPARK-14709 at 4/19/16 3:23 AM: - I'll start on this to give a quick prototype first. If time allows, I'm also thinking we should try with SMO. was (Author: yuhaoyan): I'll start on this to give a quick prototype first. > spark.ml API for linear SVM > --- > > Key: SPARK-14709 > URL: https://issues.apache.org/jira/browse/SPARK-14709 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley > > Provide API for SVM algorithm for DataFrames. I would recommend using > OWL-QN, rather than wrapping spark.mllib's SGD-based implementation. > The API should mimic existing spark.ml.classification APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14701) checkpointWriter is stopped before eventLoop. Hence rejectedExecution exception is coming in StreamingContext.stop
[ https://issues.apache.org/jira/browse/SPARK-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14701: Assignee: (was: Apache Spark) > checkpointWriter is stopped before eventLoop. Hence rejectedExecution > exception is coming in StreamingContext.stop > -- > > Key: SPARK-14701 > URL: https://issues.apache.org/jira/browse/SPARK-14701 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.1, 1.6.1 > Environment: Windows, local[*] mode as well as Redhat Linux , Yarn > Cluster >Reporter: Sreelal S L >Priority: Minor > > In org.apache.spark.streaming.scheduler.JobGenerator.stop() , the > checkpointWriter.stop is called before eventLoop.stop. > If i call the streamingContext.stop when a batch is about to complete(Im > invoking it from a StreamingListener.onBatchCompleted callback) , a > rejectedException may get thrown from checkPointWriter.executor, since the > eventLoop will try to process DoCheckpoint events even after the > checkPointWriter.executor was stopped. > 16/04/18 19:22:10 ERROR CheckpointWriter: Could not submit checkpoint task to > the thread pool executor > java.util.concurrent.RejectedExecutionException: Task > org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@76e12f8 > rejected from java.util.concurrent.ThreadPoolExecutor@4b9f5b97[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 49] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253) > at > org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294) > at > org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > I think the order of the stopping should be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14701) checkpointWriter is stopped before eventLoop. Hence rejectedExecution exception is coming in StreamingContext.stop
[ https://issues.apache.org/jira/browse/SPARK-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14701: Assignee: Apache Spark > checkpointWriter is stopped before eventLoop. Hence rejectedExecution > exception is coming in StreamingContext.stop > -- > > Key: SPARK-14701 > URL: https://issues.apache.org/jira/browse/SPARK-14701 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.1, 1.6.1 > Environment: Windows, local[*] mode as well as Redhat Linux , Yarn > Cluster >Reporter: Sreelal S L >Assignee: Apache Spark >Priority: Minor > > In org.apache.spark.streaming.scheduler.JobGenerator.stop() , the > checkpointWriter.stop is called before eventLoop.stop. > If i call the streamingContext.stop when a batch is about to complete(Im > invoking it from a StreamingListener.onBatchCompleted callback) , a > rejectedException may get thrown from checkPointWriter.executor, since the > eventLoop will try to process DoCheckpoint events even after the > checkPointWriter.executor was stopped. > 16/04/18 19:22:10 ERROR CheckpointWriter: Could not submit checkpoint task to > the thread pool executor > java.util.concurrent.RejectedExecutionException: Task > org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@76e12f8 > rejected from java.util.concurrent.ThreadPoolExecutor@4b9f5b97[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 49] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253) > at > org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294) > at > org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > I think the order of the stopping should be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14701) checkpointWriter is stopped before eventLoop. Hence rejectedExecution exception is coming in StreamingContext.stop
[ https://issues.apache.org/jira/browse/SPARK-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247080#comment-15247080 ] Apache Spark commented on SPARK-14701: -- User 'lw-lin' has created a pull request for this issue: https://github.com/apache/spark/pull/12489 > checkpointWriter is stopped before eventLoop. Hence rejectedExecution > exception is coming in StreamingContext.stop > -- > > Key: SPARK-14701 > URL: https://issues.apache.org/jira/browse/SPARK-14701 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.5.1, 1.6.1 > Environment: Windows, local[*] mode as well as Redhat Linux , Yarn > Cluster >Reporter: Sreelal S L >Priority: Minor > > In org.apache.spark.streaming.scheduler.JobGenerator.stop() , the > checkpointWriter.stop is called before eventLoop.stop. > If i call the streamingContext.stop when a batch is about to complete(Im > invoking it from a StreamingListener.onBatchCompleted callback) , a > rejectedException may get thrown from checkPointWriter.executor, since the > eventLoop will try to process DoCheckpoint events even after the > checkPointWriter.executor was stopped. > 16/04/18 19:22:10 ERROR CheckpointWriter: Could not submit checkpoint task to > the thread pool executor > java.util.concurrent.RejectedExecutionException: Task > org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@76e12f8 > rejected from java.util.concurrent.ThreadPoolExecutor@4b9f5b97[Terminated, > pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 49] > at > java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372) > at > org.apache.spark.streaming.CheckpointWriter.write(Checkpoint.scala:253) > at > org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:294) > at > org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:184) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:87) > at > org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:86) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > I think the order of the stopping should be changed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14723) A new way to support dynamic allocation in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-14723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] WilliamZhu updated SPARK-14723: --- Attachment: spark-streaming-dynamic-allocation-desigh.pdf > A new way to support dynamic allocation in Spark Streaming > -- > > Key: SPARK-14723 > URL: https://issues.apache.org/jira/browse/SPARK-14723 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: WilliamZhu > Labels: features > Fix For: 2.1.0 > > Attachments: spark-streaming-dynamic-allocation-desigh.pdf > > > Provide a more powerful Algorithm to support dynamic allocation in spark > streaming. > more details: http://www.jianshu.com/p/ae7fdd4746f6 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14723) A new way to support dynamic allocation in Spark Streaming
WilliamZhu created SPARK-14723: -- Summary: A new way to support dynamic allocation in Spark Streaming Key: SPARK-14723 URL: https://issues.apache.org/jira/browse/SPARK-14723 Project: Spark Issue Type: Improvement Components: Spark Core, Streaming Reporter: WilliamZhu Fix For: 2.1.0 Provide a more powerful Algorithm to support dynamic allocation in spark streaming. more details: http://www.jianshu.com/p/ae7fdd4746f6 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12922) Implement gapply() on DataFrame in SparkR
[ https://issues.apache.org/jira/browse/SPARK-12922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247047#comment-15247047 ] Sun Rui commented on SPARK-12922: - [~Narine], 1. Typically users don't care number of partitions in SparkSQL. If they care, they can tune it by setting “spark.sql.shuffle.partitions”. It seems not related to implementation of gapply? 2. I think we need support groupBy instead of groupByKey for DataFrame. for groupBy, users can specify multiple key columns at once. So a list should be used to hold the key columns. FYI, I have basically implemented dapply(), and is debugging it > Implement gapply() on DataFrame in SparkR > - > > Key: SPARK-12922 > URL: https://issues.apache.org/jira/browse/SPARK-12922 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Affects Versions: 1.6.0 >Reporter: Sun Rui > > gapply() applies an R function on groups grouped by one or more columns of a > DataFrame, and returns a DataFrame. It is like GroupedDataSet.flatMapGroups() > in the Dataset API. > Two API styles are supported: > 1. > {code} > gd <- groupBy(df, col1, ...) > gapply(gd, function(grouping_key, group) {}, schema) > {code} > 2. > {code} > gapply(df, grouping_columns, function(grouping_key, group) {}, schema) > {code} > R function input: grouping keys value, a local data.frame of this grouped > data > R function output: local data.frame > Schema specifies the Row format of the output of the R function. It must > match the R function's output. > Note that map-side combination (partial aggregation) is not supported, user > could do map-side combination via dapply(). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
[ https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-14719. --- Resolution: Fixed Fix Version/s: 2.0.0 > WriteAheadLogBasedBlockHandler should ignore BlockManager put errors > > > Key: SPARK-14719 > URL: https://issues.apache.org/jira/browse/SPARK-14719 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Josh Rosen >Assignee: Josh Rosen > Fix For: 2.0.0 > > > {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if > BlockManager puts fail, even though those puts are only performed as a > performance optimization. Instead, it should log and ignore exceptions > originating from the block manager put. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14667) Remove HashShuffleManager
[ https://issues.apache.org/jira/browse/SPARK-14667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14667. - Resolution: Fixed Fix Version/s: 2.0.0 > Remove HashShuffleManager > - > > Key: SPARK-14667 > URL: https://issues.apache.org/jira/browse/SPARK-14667 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > The sort shuffle manager has been the default since Spark 1.2. It is time to > remove the old hash shuffle manager. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12072) python dataframe ._jdf.schema().json() breaks on large metadata dataframes
[ https://issues.apache.org/jira/browse/SPARK-12072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15247017#comment-15247017 ] holdenk commented on SPARK-12072: - Any results yet? > python dataframe ._jdf.schema().json() breaks on large metadata dataframes > -- > > Key: SPARK-12072 > URL: https://issues.apache.org/jira/browse/SPARK-12072 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.2 >Reporter: Rares Mirica > > When a dataframe contains a column with a large number of values in ml_attr, > schema evaluation will routinely fail on getting the schema as json, this > will, in turn, cause a bunch of problems with, eg: calling udfs on the schema > because calling columns relies on > _parse_datatype_json_string(self._jdf.schema().json()) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13227) Risky apply() in OpenHashMap
[ https://issues.apache.org/jira/browse/SPARK-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13227. - Resolution: Fixed Assignee: Nan Zhu Fix Version/s: 2.0.0 > Risky apply() in OpenHashMap > > > Key: SPARK-13227 > URL: https://issues.apache.org/jira/browse/SPARK-13227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Nan Zhu >Assignee: Nan Zhu >Priority: Minor > Fix For: 2.0.0 > > > It might confuse the future developers when they use OpenHashMap.apply() with > a numeric value type. > null.asInstance[Int], null.asInstance[Long], null.asInstace[Float] and > null.asInstance[Double] will return 0/0.0/0L, which might confuse the > developer if the value set contains 0/0.0/0L with an existing key -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13227) Risky apply() in OpenHashMap
[ https://issues.apache.org/jira/browse/SPARK-13227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13227: Fix Version/s: 1.6.2 > Risky apply() in OpenHashMap > > > Key: SPARK-13227 > URL: https://issues.apache.org/jira/browse/SPARK-13227 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0 >Reporter: Nan Zhu >Assignee: Nan Zhu >Priority: Minor > Fix For: 1.6.2, 2.0.0 > > > It might confuse the future developers when they use OpenHashMap.apply() with > a numeric value type. > null.asInstance[Int], null.asInstance[Long], null.asInstace[Float] and > null.asInstance[Double] will return 0/0.0/0L, which might confuse the > developer if the value set contains 0/0.0/0L with an existing key -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14706) Python ML persistence integration test
[ https://issues.apache.org/jira/browse/SPARK-14706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246950#comment-15246950 ] Xusen Yin commented on SPARK-14706: --- I am starting write it. > Python ML persistence integration test > -- > > Key: SPARK-14706 > URL: https://issues.apache.org/jira/browse/SPARK-14706 > Project: Spark > Issue Type: Test > Components: ML, PySpark >Reporter: Joseph K. Bradley > > Goal: extend integration test in {{ml/tests.py}}. > In the {{PersistenceTest}} suite, there is a method {{_compare_pipelines}}. > This issue includes: > * Extending {{_compare_pipelines}} to handle CrossValidator, > TrainValidationSplit, and OneVsRest > * Adding an integration test in PersistenceTest which includes nested > meta-algorithms. E.g.: {{Pipeline[ CrossValidator[ TrainValidationSplit[ > OneVsRest[ LogisticRegression ] ] ] ]}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14602) [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length limit on Windows
[ https://issues.apache.org/jira/browse/SPARK-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14602: Assignee: (was: Apache Spark) > [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length > limit on Windows > -- > > Key: SPARK-14602 > URL: https://issues.apache.org/jira/browse/SPARK-14602 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit, Windows, YARN >Affects Versions: 2.0.0 > Environment: YARN on Windows >Reporter: Sebastian Kochman > > After change https://issues.apache.org/jira/browse/SPARK-11157, which removed > a single large Spark assembly in favor of multiple small jars, when you try > to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app > fails with the following error: > Diagnostics: The command line has a length of 12046 exceeds maximum allowed > length of 8191. Command starts with: @set > SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_[...] Failing this > attempt. Failing the application. > So basically a large number of jars needed for staging in YARN causes > exceeding Windows command line length limit. > Please see more details in the discussion here: > https://issues.apache.org/jira/browse/SPARK-11157?focusedCommentId=15238151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15238151 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14602) [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length limit on Windows
[ https://issues.apache.org/jira/browse/SPARK-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14602: Assignee: Apache Spark > [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length > limit on Windows > -- > > Key: SPARK-14602 > URL: https://issues.apache.org/jira/browse/SPARK-14602 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit, Windows, YARN >Affects Versions: 2.0.0 > Environment: YARN on Windows >Reporter: Sebastian Kochman >Assignee: Apache Spark > > After change https://issues.apache.org/jira/browse/SPARK-11157, which removed > a single large Spark assembly in favor of multiple small jars, when you try > to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app > fails with the following error: > Diagnostics: The command line has a length of 12046 exceeds maximum allowed > length of 8191. Command starts with: @set > SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_[...] Failing this > attempt. Failing the application. > So basically a large number of jars needed for staging in YARN causes > exceeding Windows command line length limit. > Please see more details in the discussion here: > https://issues.apache.org/jira/browse/SPARK-11157?focusedCommentId=15238151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15238151 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14602) [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length limit on Windows
[ https://issues.apache.org/jira/browse/SPARK-14602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246942#comment-15246942 ] Apache Spark commented on SPARK-14602: -- User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/12487 > [YARN+Windows] Setting SPARK_YARN_CACHE_FILES exceeds command line length > limit on Windows > -- > > Key: SPARK-14602 > URL: https://issues.apache.org/jira/browse/SPARK-14602 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit, Windows, YARN >Affects Versions: 2.0.0 > Environment: YARN on Windows >Reporter: Sebastian Kochman > > After change https://issues.apache.org/jira/browse/SPARK-11157, which removed > a single large Spark assembly in favor of multiple small jars, when you try > to submit a Spark app to YARN on Windows (using spark-submit.cmd), the app > fails with the following error: > Diagnostics: The command line has a length of 12046 exceeds maximum allowed > length of 8191. Command starts with: @set > SPARK_YARN_CACHE_FILES=[...]/.sparkStaging/application_[...] Failing this > attempt. Failing the application. > So basically a large number of jars needed for staging in YARN causes > exceeding Windows command line length limit. > Please see more details in the discussion here: > https://issues.apache.org/jira/browse/SPARK-11157?focusedCommentId=15238151&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15238151 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13643) Create SparkSession interface
[ https://issues.apache.org/jira/browse/SPARK-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13643: Assignee: (was: Apache Spark) > Create SparkSession interface > - > > Key: SPARK-13643 > URL: https://issues.apache.org/jira/browse/SPARK-13643 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13643) Create SparkSession interface
[ https://issues.apache.org/jira/browse/SPARK-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246935#comment-15246935 ] Apache Spark commented on SPARK-13643: -- User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/12485 > Create SparkSession interface > - > > Key: SPARK-13643 > URL: https://issues.apache.org/jira/browse/SPARK-13643 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13643) Create SparkSession interface
[ https://issues.apache.org/jira/browse/SPARK-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13643: Assignee: Apache Spark > Create SparkSession interface > - > > Key: SPARK-13643 > URL: https://issues.apache.org/jira/browse/SPARK-13643 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen
[ https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14722: Assignee: (was: Apache Spark) > Rename upstreams() -> inputRDDs() in WholeStageCodegen > -- > > Key: SPARK-14722 > URL: https://issues.apache.org/jira/browse/SPARK-14722 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen
[ https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14722: Assignee: Apache Spark > Rename upstreams() -> inputRDDs() in WholeStageCodegen > -- > > Key: SPARK-14722 > URL: https://issues.apache.org/jira/browse/SPARK-14722 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen
[ https://issues.apache.org/jira/browse/SPARK-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246903#comment-15246903 ] Apache Spark commented on SPARK-14722: -- User 'sameeragarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/12486 > Rename upstreams() -> inputRDDs() in WholeStageCodegen > -- > > Key: SPARK-14722 > URL: https://issues.apache.org/jira/browse/SPARK-14722 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14709) spark.ml API for linear SVM
[ https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246882#comment-15246882 ] yuhao yang edited comment on SPARK-14709 at 4/19/16 1:14 AM: - I'll start on this to give a quick prototype first. was (Author: yuhaoyan): I'll start on this. > spark.ml API for linear SVM > --- > > Key: SPARK-14709 > URL: https://issues.apache.org/jira/browse/SPARK-14709 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley > > Provide API for SVM algorithm for DataFrames. I would recommend using > OWL-QN, rather than wrapping spark.mllib's SGD-based implementation. > The API should mimic existing spark.ml.classification APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14722) Rename upstreams() -> inputRDDs() in WholeStageCodegen
Sameer Agarwal created SPARK-14722: -- Summary: Rename upstreams() -> inputRDDs() in WholeStageCodegen Key: SPARK-14722 URL: https://issues.apache.org/jira/browse/SPARK-14722 Project: Spark Issue Type: Improvement Components: SQL Reporter: Sameer Agarwal -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14720) Move the rest of HiveContext to HiveSessionState
[ https://issues.apache.org/jira/browse/SPARK-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14720: Assignee: Apache Spark (was: Andrew Or) > Move the rest of HiveContext to HiveSessionState > > > Key: SPARK-14720 > URL: https://issues.apache.org/jira/browse/SPARK-14720 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Apache Spark > > This will be a major cleanup task. Unfortunately part of the state will leak > to SessionState, which shouldn't know anything about Hive. Part of the effort > here is to create a new SparkSession interface (SPARK-13643) and do > reflection there to decide which SessionState to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14720) Move the rest of HiveContext to HiveSessionState
[ https://issues.apache.org/jira/browse/SPARK-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14720: Assignee: Andrew Or (was: Apache Spark) > Move the rest of HiveContext to HiveSessionState > > > Key: SPARK-14720 > URL: https://issues.apache.org/jira/browse/SPARK-14720 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > This will be a major cleanup task. Unfortunately part of the state will leak > to SessionState, which shouldn't know anything about Hive. Part of the effort > here is to create a new SparkSession interface (SPARK-13643) and do > reflection there to decide which SessionState to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14720) Move the rest of HiveContext to HiveSessionState
[ https://issues.apache.org/jira/browse/SPARK-14720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246897#comment-15246897 ] Apache Spark commented on SPARK-14720: -- User 'andrewor14' has created a pull request for this issue: https://github.com/apache/spark/pull/12485 > Move the rest of HiveContext to HiveSessionState > > > Key: SPARK-14720 > URL: https://issues.apache.org/jira/browse/SPARK-14720 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > > This will be a major cleanup task. Unfortunately part of the state will leak > to SessionState, which shouldn't know anything about Hive. Part of the effort > here is to create a new SparkSession interface (SPARK-13643) and do > reflection there to decide which SessionState to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14721) Actually remove the HiveContext file itself
Andrew Or created SPARK-14721: - Summary: Actually remove the HiveContext file itself Key: SPARK-14721 URL: https://issues.apache.org/jira/browse/SPARK-14721 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14720) Move the rest of HiveContext to HiveSessionState
Andrew Or created SPARK-14720: - Summary: Move the rest of HiveContext to HiveSessionState Key: SPARK-14720 URL: https://issues.apache.org/jira/browse/SPARK-14720 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.0.0 Reporter: Andrew Or Assignee: Andrew Or This will be a major cleanup task. Unfortunately part of the state will leak to SessionState, which shouldn't know anything about Hive. Part of the effort here is to create a new SparkSession interface (SPARK-13643) and do reflection there to decide which SessionState to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14709) spark.ml API for linear SVM
[ https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246882#comment-15246882 ] yuhao yang commented on SPARK-14709: I'll start on this. > spark.ml API for linear SVM > --- > > Key: SPARK-14709 > URL: https://issues.apache.org/jira/browse/SPARK-14709 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley > > Provide API for SVM algorithm for DataFrames. I would recommend using > OWL-QN, rather than wrapping spark.mllib's SGD-based implementation. > The API should mimic existing spark.ml.classification APIs. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14711) Examples jar not a part of distribution
[ https://issues.apache.org/jira/browse/SPARK-14711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-14711. Resolution: Fixed Assignee: Mark Grover Fix Version/s: 2.0.0 > Examples jar not a part of distribution > --- > > Key: SPARK-14711 > URL: https://issues.apache.org/jira/browse/SPARK-14711 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.0.0 >Reporter: Mark Grover >Assignee: Mark Grover > Fix For: 2.0.0 > > > While mucking around with some examples, it seems like spark-examples jar is > not being included in the distribution tarball. Also, it's not in the > classpath in the spark-submit classpath, which means commands like > {{run-example}} fail to work, whether a "distribution" tarball is used or a > regular {{mvn package}} build. > The root cause of this may be due to the fact that the spark-examples jar is > under {{$SPARK_HOME/examples/target}} while all its dependencies are at > {{$SPARK_HOME/examples/target/scala-2.11/jars}}. And, we only seem to be > including the jars directory in the classpath. See > [here|https://github.com/apache/spark/blob/master/launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java#L354] > for details. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases
[ https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-14714. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12480 [https://github.com/apache/spark/pull/12480] > PySpark Param TypeConverter arg is not passed by name in some cases > --- > > Key: SPARK-14714 > URL: https://issues.apache.org/jira/browse/SPARK-14714 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > Fix For: 2.0.0 > > > PySpark Param constructors need to pass the TypeConverter argument by name, > partly to make sure it is not mistaken for the expectedType arg and partly > because we will remove the expectedType arg in 2.1. In several places, this > is not being done correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14515) Add python example for ChiSqSelector
[ https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-14515. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12283 [https://github.com/apache/spark/pull/12283] > Add python example for ChiSqSelector > > > Key: SPARK-14515 > URL: https://issues.apache.org/jira/browse/SPARK-14515 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML, PySpark >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > Fix For: 2.0.0 > > > Add the missing python example for ChiSqSelector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
[ https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14719: Assignee: Apache Spark (was: Josh Rosen) > WriteAheadLogBasedBlockHandler should ignore BlockManager put errors > > > Key: SPARK-14719 > URL: https://issues.apache.org/jira/browse/SPARK-14719 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Josh Rosen >Assignee: Apache Spark > > {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if > BlockManager puts fail, even though those puts are only performed as a > performance optimization. Instead, it should log and ignore exceptions > originating from the block manager put. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
[ https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246811#comment-15246811 ] Apache Spark commented on SPARK-14719: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/12484 > WriteAheadLogBasedBlockHandler should ignore BlockManager put errors > > > Key: SPARK-14719 > URL: https://issues.apache.org/jira/browse/SPARK-14719 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Josh Rosen >Assignee: Josh Rosen > > {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if > BlockManager puts fail, even though those puts are only performed as a > performance optimization. Instead, it should log and ignore exceptions > originating from the block manager put. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
[ https://issues.apache.org/jira/browse/SPARK-14719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14719: Assignee: Josh Rosen (was: Apache Spark) > WriteAheadLogBasedBlockHandler should ignore BlockManager put errors > > > Key: SPARK-14719 > URL: https://issues.apache.org/jira/browse/SPARK-14719 > Project: Spark > Issue Type: Bug > Components: Streaming >Reporter: Josh Rosen >Assignee: Josh Rosen > > {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if > BlockManager puts fail, even though those puts are only performed as a > performance optimization. Instead, it should log and ignore exceptions > originating from the block manager put. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14719) WriteAheadLogBasedBlockHandler should ignore BlockManager put errors
Josh Rosen created SPARK-14719: -- Summary: WriteAheadLogBasedBlockHandler should ignore BlockManager put errors Key: SPARK-14719 URL: https://issues.apache.org/jira/browse/SPARK-14719 Project: Spark Issue Type: Bug Components: Streaming Reporter: Josh Rosen Assignee: Josh Rosen {{WriteAheadLogBasedBlockHandler}} will currently throw exceptions if BlockManager puts fail, even though those puts are only performed as a performance optimization. Instead, it should log and ignore exceptions originating from the block manager put. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13698) Fix Analysis Exceptions when Using Backticks in Generate
[ https://issues.apache.org/jira/browse/SPARK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-13698. - Resolution: Fixed Assignee: Dilip Biswal Fix Version/s: 2.0.0 > Fix Analysis Exceptions when Using Backticks in Generate > > > Key: SPARK-13698 > URL: https://issues.apache.org/jira/browse/SPARK-13698 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Dilip Biswal >Assignee: Dilip Biswal > Fix For: 2.0.0 > > > Analysis exception occurs while running the following query. > {code} > SELECT ints FROM nestedArray LATERAL VIEW explode(a.b) `a` AS `ints` > {code} > {code} > Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot > resolve '`ints`' given input columns: [a, `ints`]; line 1 pos 7 > 'Project ['ints] > +- Generate explode(a#0.b), true, false, Some(a), [`ints`#8] >+- SubqueryAlias nestedarray > +- LocalRelation [a#0], 1,2,3 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13698) Fix Analysis Exceptions when Using Backticks in Generate
[ https://issues.apache.org/jira/browse/SPARK-13698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246796#comment-15246796 ] Dilip Biswal commented on SPARK-13698: -- [~cloud_fan] Hi Wenchen, Can you please help to fix the assignee field for this JIRA ? Thanks in advance !! > Fix Analysis Exceptions when Using Backticks in Generate > > > Key: SPARK-13698 > URL: https://issues.apache.org/jira/browse/SPARK-13698 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Dilip Biswal > > Analysis exception occurs while running the following query. > {code} > SELECT ints FROM nestedArray LATERAL VIEW explode(a.b) `a` AS `ints` > {code} > {code} > Failed to analyze query: org.apache.spark.sql.AnalysisException: cannot > resolve '`ints`' given input columns: [a, `ints`]; line 1 pos 7 > 'Project ['ints] > +- Generate explode(a#0.b), true, false, Some(a), [`ints`#8] >+- SubqueryAlias nestedarray > +- LocalRelation [a#0], 1,2,3 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14718) Avoid mutating ExprCode in doGenCode
[ https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14718: Assignee: Apache Spark > Avoid mutating ExprCode in doGenCode > > > Key: SPARK-14718 > URL: https://issues.apache.org/jira/browse/SPARK-14718 > Project: Spark > Issue Type: Improvement >Reporter: Sameer Agarwal >Assignee: Apache Spark > > The `doGenCode` method currently takes in an ExprCode, mutates it and returns > the java code to evaluate the given expression. It should instead just return > a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14718) Avoid mutating ExprCode in doGenCode
[ https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246792#comment-15246792 ] Apache Spark commented on SPARK-14718: -- User 'sameeragarwal' has created a pull request for this issue: https://github.com/apache/spark/pull/12483 > Avoid mutating ExprCode in doGenCode > > > Key: SPARK-14718 > URL: https://issues.apache.org/jira/browse/SPARK-14718 > Project: Spark > Issue Type: Improvement >Reporter: Sameer Agarwal > > The `doGenCode` method currently takes in an ExprCode, mutates it and returns > the java code to evaluate the given expression. It should instead just return > a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14718) Avoid mutating ExprCode in doGenCode
[ https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14718: Assignee: (was: Apache Spark) > Avoid mutating ExprCode in doGenCode > > > Key: SPARK-14718 > URL: https://issues.apache.org/jira/browse/SPARK-14718 > Project: Spark > Issue Type: Improvement >Reporter: Sameer Agarwal > > The `doGenCode` method currently takes in an ExprCode, mutates it and returns > the java code to evaluate the given expression. It should instead just return > a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-1239) Improve fetching of map output statuses
[ https://issues.apache.org/jira/browse/SPARK-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-1239: --- Target Version/s: 2.0.0 > Improve fetching of map output statuses > --- > > Key: SPARK-1239 > URL: https://issues.apache.org/jira/browse/SPARK-1239 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 1.0.2, 1.1.0 >Reporter: Patrick Wendell >Assignee: Thomas Graves > > Instead we should modify the way we fetch map output statuses to take both a > mapper and a reducer - or we should just piggyback the statuses on each task. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14718) Avoid mutating ExprCode in doGenCode
[ https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal updated SPARK-14718: --- Summary: Avoid mutating ExprCode in doGenCode (was: doGenCode should return a new ExprCode and not mutate existing one) > Avoid mutating ExprCode in doGenCode > > > Key: SPARK-14718 > URL: https://issues.apache.org/jira/browse/SPARK-14718 > Project: Spark > Issue Type: Improvement >Reporter: Sameer Agarwal > > The `doGenCode` method currently takes in an ExprCode, mutates it and returns > the java code to evaluate the given expression. It should instead just return > a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14718) doGenCode should return a new ExprCode and not mutate existing one
[ https://issues.apache.org/jira/browse/SPARK-14718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal updated SPARK-14718: --- Summary: doGenCode should return a new ExprCode and not mutate existing one (was: doGenCode should return a new ExprCode and note mutate existing one) > doGenCode should return a new ExprCode and not mutate existing one > -- > > Key: SPARK-14718 > URL: https://issues.apache.org/jira/browse/SPARK-14718 > Project: Spark > Issue Type: Improvement >Reporter: Sameer Agarwal > > The `doGenCode` method currently takes in an ExprCode, mutates it and returns > the java code to evaluate the given expression. It should instead just return > a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14718) doGenCode should return a new ExprCode and note mutate existing one
Sameer Agarwal created SPARK-14718: -- Summary: doGenCode should return a new ExprCode and note mutate existing one Key: SPARK-14718 URL: https://issues.apache.org/jira/browse/SPARK-14718 Project: Spark Issue Type: Improvement Reporter: Sameer Agarwal The `doGenCode` method currently takes in an ExprCode, mutates it and returns the java code to evaluate the given expression. It should instead just return a new ExprCode to avoid passing around mutable objects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value
[ https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758 ] Felix Cheung edited comment on SPARK-14717 at 4/18/16 11:19 PM: I can take this [~davies] was (Author: felixcheung): I can take this @davies > Scala, Python APIs for Dataset.unpersist differ in default blocking value > - > > Key: SPARK-14717 > URL: https://issues.apache.org/jira/browse/SPARK-14717 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Priority: Minor > > In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but > in Python, it is set to True by default. We should presumably make them > consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value
[ https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758 ] Felix Cheung edited comment on SPARK-14717 at 4/18/16 11:18 PM: I can take this @davies was (Author: felixcheung): I can take this @davies > Scala, Python APIs for Dataset.unpersist differ in default blocking value > - > > Key: SPARK-14717 > URL: https://issues.apache.org/jira/browse/SPARK-14717 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Priority: Minor > > In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but > in Python, it is set to True by default. We should presumably make them > consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14515) Add python example for ChiSqSelector
[ https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14515: -- Shepherd: Joseph K. Bradley Assignee: zhengruifeng Target Version/s: 2.0.0 Component/s: PySpark ML > Add python example for ChiSqSelector > > > Key: SPARK-14515 > URL: https://issues.apache.org/jira/browse/SPARK-14515 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML, PySpark >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > > Add the missing python example for ChiSqSelector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value
[ https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758 ] Felix Cheung edited comment on SPARK-14717 at 4/18/16 11:17 PM: I can take this @davies was (Author: felixcheung): I can take this [~davis] > Scala, Python APIs for Dataset.unpersist differ in default blocking value > - > > Key: SPARK-14717 > URL: https://issues.apache.org/jira/browse/SPARK-14717 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Priority: Minor > > In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but > in Python, it is set to True by default. We should presumably make them > consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value
[ https://issues.apache.org/jira/browse/SPARK-14717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246758#comment-15246758 ] Felix Cheung commented on SPARK-14717: -- I can take this [~davis] > Scala, Python APIs for Dataset.unpersist differ in default blocking value > - > > Key: SPARK-14717 > URL: https://issues.apache.org/jira/browse/SPARK-14717 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Priority: Minor > > In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but > in Python, it is set to True by default. We should presumably make them > consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14515) Add python example for ChiSqSelector
[ https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14515: -- Priority: Minor (was: Major) > Add python example for ChiSqSelector > > > Key: SPARK-14515 > URL: https://issues.apache.org/jira/browse/SPARK-14515 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: zhengruifeng >Priority: Minor > > Add the missing python example for ChiSqSelector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14515) Add python example for ChiSqSelector
[ https://issues.apache.org/jira/browse/SPARK-14515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-14515: -- Issue Type: Documentation (was: Improvement) > Add python example for ChiSqSelector > > > Key: SPARK-14515 > URL: https://issues.apache.org/jira/browse/SPARK-14515 > Project: Spark > Issue Type: Documentation > Components: Documentation >Reporter: zhengruifeng > > Add the missing python example for ChiSqSelector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14717) Scala, Python APIs for Dataset.unpersist differ in default blocking value
Joseph K. Bradley created SPARK-14717: - Summary: Scala, Python APIs for Dataset.unpersist differ in default blocking value Key: SPARK-14717 URL: https://issues.apache.org/jira/browse/SPARK-14717 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 2.0.0 Reporter: Joseph K. Bradley Priority: Minor In Scala/Java {{Dataset.unpersist()}} sets blocking = false by default, but in Python, it is set to True by default. We should presumably make them consistent. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14716) Add partitioned parquet support file stream sink
[ https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246658#comment-15246658 ] Apache Spark commented on SPARK-14716: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/12409 > Add partitioned parquet support file stream sink > > > Key: SPARK-14716 > URL: https://issues.apache.org/jira/browse/SPARK-14716 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14716) Add partitioned parquet support file stream sink
[ https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14716: Assignee: (was: Apache Spark) > Add partitioned parquet support file stream sink > > > Key: SPARK-14716 > URL: https://issues.apache.org/jira/browse/SPARK-14716 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14716) Add partitioned parquet support file stream sink
[ https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14716: Assignee: Apache Spark > Add partitioned parquet support file stream sink > > > Key: SPARK-14716 > URL: https://issues.apache.org/jira/browse/SPARK-14716 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14716) Add partitioned parquet support file stream sink
[ https://issues.apache.org/jira/browse/SPARK-14716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-14716: -- Summary: Add partitioned parquet support file stream sink (was: Added partitioned parquet support file stream sink) > Add partitioned parquet support file stream sink > > > Key: SPARK-14716 > URL: https://issues.apache.org/jira/browse/SPARK-14716 > Project: Spark > Issue Type: Sub-task > Components: SQL, Streaming >Reporter: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14716) Added partitioned parquet support file stream sink
Tathagata Das created SPARK-14716: - Summary: Added partitioned parquet support file stream sink Key: SPARK-14716 URL: https://issues.apache.org/jira/browse/SPARK-14716 Project: Spark Issue Type: Sub-task Reporter: Tathagata Das -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14713) Fix flaky test: o.a.s.network.netty.NettyBlockTransferServiceSuite.can bind to a specific port twice and the second increments
[ https://issues.apache.org/jira/browse/SPARK-14713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-14713. -- Resolution: Fixed Fix Version/s: 2.0.0 > Fix flaky test: o.a.s.network.netty.NettyBlockTransferServiceSuite.can bind > to a specific port twice and the second increments > -- > > Key: SPARK-14713 > URL: https://issues.apache.org/jira/browse/SPARK-14713 > Project: Spark > Issue Type: Test > Components: Tests >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > > When there are multiple tests running, "NettyBlockTransferServiceSuite.can > bind to a specific port twice and the second increments" may be flaky. See: > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.2/786/testReport/junit/org.apache.spark.network.netty/NettyBlockTransferServiceSuite/can_bind_to_a_specific_port_twice_and_the_second_increments/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14699) Driver is marked as failed even it runs successfully
[ https://issues.apache.org/jira/browse/SPARK-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246609#comment-15246609 ] Apache Spark commented on SPARK-14699: -- User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/12481 > Driver is marked as failed even it runs successfully > > > Key: SPARK-14699 > URL: https://issues.apache.org/jira/browse/SPARK-14699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0, 1.6.1 > Environment: Standalone deployment >Reporter: Huiqiang Liu > > We recently upgraded Spark from 1.5.2 to 1.6.0 and found that all batch jobs > are marked as failed. > To address this issue, we wrote a simple test application which just sum up > from 1 to 1 and it is marked as failed even though its result was correct. > Here is the typical stderr message and there is "ERROR worker.WorkerWatcher: > Lost connection to worker rpc" when driver exits. > 16/04/14 06:20:41 INFO scheduler.DAGScheduler: ResultStage 1 (sum at > SparkBatchTest.scala:19) finished in 0.052 s > 16/04/14 06:20:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, > whose tasks have all completed, from pool > 16/04/14 06:20:41 INFO scheduler.DAGScheduler: Job 1 finished: sum at > SparkBatchTest.scala:19, took 0.061177 s > 16/04/14 06:20:41 ERROR worker.WorkerWatcher: Lost connection to worker rpc > endpoint spark://wor...@spark-worker-ltv-prod-006.prod.vungle.com:7078. > Exiting. > 16/04/14 06:20:41 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on 172.16.33.187:36442 in memory (size: 1452.0 B, free: 511.1 MB) > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on ip-172-16-31-86.ec2.internal:29708 in memory (size: 1452.0 B, free: 511.1 > MB) > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on ip-172-16-32-207.ec2.internal:21259 in memory (size: 1452.0 B, free: 511.1 > MB) > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs,null} > 1
[jira] [Assigned] (SPARK-14699) Driver is marked as failed even it runs successfully
[ https://issues.apache.org/jira/browse/SPARK-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14699: Assignee: (was: Apache Spark) > Driver is marked as failed even it runs successfully > > > Key: SPARK-14699 > URL: https://issues.apache.org/jira/browse/SPARK-14699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0, 1.6.1 > Environment: Standalone deployment >Reporter: Huiqiang Liu > > We recently upgraded Spark from 1.5.2 to 1.6.0 and found that all batch jobs > are marked as failed. > To address this issue, we wrote a simple test application which just sum up > from 1 to 1 and it is marked as failed even though its result was correct. > Here is the typical stderr message and there is "ERROR worker.WorkerWatcher: > Lost connection to worker rpc" when driver exits. > 16/04/14 06:20:41 INFO scheduler.DAGScheduler: ResultStage 1 (sum at > SparkBatchTest.scala:19) finished in 0.052 s > 16/04/14 06:20:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, > whose tasks have all completed, from pool > 16/04/14 06:20:41 INFO scheduler.DAGScheduler: Job 1 finished: sum at > SparkBatchTest.scala:19, took 0.061177 s > 16/04/14 06:20:41 ERROR worker.WorkerWatcher: Lost connection to worker rpc > endpoint spark://wor...@spark-worker-ltv-prod-006.prod.vungle.com:7078. > Exiting. > 16/04/14 06:20:41 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on 172.16.33.187:36442 in memory (size: 1452.0 B, free: 511.1 MB) > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on ip-172-16-31-86.ec2.internal:29708 in memory (size: 1452.0 B, free: 511.1 > MB) > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on ip-172-16-32-207.ec2.internal:21259 in memory (size: 1452.0 B, free: 511.1 > MB) > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs,null} > 16/04/14 06:20:41 INFO spark.ContextCleaner: Cleaned accumulator 2 > 16/04/14 06:20:41 INFO storage.BlockManagerInf
[jira] [Assigned] (SPARK-14699) Driver is marked as failed even it runs successfully
[ https://issues.apache.org/jira/browse/SPARK-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14699: Assignee: Apache Spark > Driver is marked as failed even it runs successfully > > > Key: SPARK-14699 > URL: https://issues.apache.org/jira/browse/SPARK-14699 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0, 1.6.1 > Environment: Standalone deployment >Reporter: Huiqiang Liu >Assignee: Apache Spark > > We recently upgraded Spark from 1.5.2 to 1.6.0 and found that all batch jobs > are marked as failed. > To address this issue, we wrote a simple test application which just sum up > from 1 to 1 and it is marked as failed even though its result was correct. > Here is the typical stderr message and there is "ERROR worker.WorkerWatcher: > Lost connection to worker rpc" when driver exits. > 16/04/14 06:20:41 INFO scheduler.DAGScheduler: ResultStage 1 (sum at > SparkBatchTest.scala:19) finished in 0.052 s > 16/04/14 06:20:41 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, > whose tasks have all completed, from pool > 16/04/14 06:20:41 INFO scheduler.DAGScheduler: Job 1 finished: sum at > SparkBatchTest.scala:19, took 0.061177 s > 16/04/14 06:20:41 ERROR worker.WorkerWatcher: Lost connection to worker rpc > endpoint spark://wor...@spark-worker-ltv-prod-006.prod.vungle.com:7078. > Exiting. > 16/04/14 06:20:41 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on 172.16.33.187:36442 in memory (size: 1452.0 B, free: 511.1 MB) > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on ip-172-16-31-86.ec2.internal:29708 in memory (size: 1452.0 B, free: 511.1 > MB) > 16/04/14 06:20:41 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 > on ip-172-16-32-207.ec2.internal:21259 in memory (size: 1452.0 B, free: 511.1 > MB) > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/json,null} > 16/04/14 06:20:41 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs,null} > 16/04/14 06:20:41 INFO spark.ContextCleaner: Cleaned accumulator 2 > 16/04/14 06:20:41 INF
[jira] [Commented] (SPARK-14712) spark.ml LogisticRegressionModel.toString should summarize model
[ https://issues.apache.org/jira/browse/SPARK-14712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246600#comment-15246600 ] Gayathri Murali commented on SPARK-14712: - _repr_ is defined for LabeledPoint and LinearModel in mllib.regression not with LogisticRegressionModel. Would you like to add this for LogisticRegressionModel in both ml and mllib? > spark.ml LogisticRegressionModel.toString should summarize model > > > Key: SPARK-14712 > URL: https://issues.apache.org/jira/browse/SPARK-14712 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Trivial > Labels: starter > > spark.mllib LogisticRegressionModel overrides toString to print a little > model info. We should do the same in spark.ml. I'd recommend: > * super.toString > * numClasses > * numFeatures > We should also override {{__repr__}} in pyspark to do the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14504) Enable Oracle docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-14504. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12270 [https://github.com/apache/spark/pull/12270] > Enable Oracle docker integration tests > -- > > Key: SPARK-14504 > URL: https://issues.apache.org/jira/browse/SPARK-14504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Luciano Resende >Priority: Minor > Fix For: 2.0.0 > > > Enable Oracle docker integration tests -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14504) Enable Oracle docker integration tests
[ https://issues.apache.org/jira/browse/SPARK-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-14504: --- Assignee: Luciano Resende > Enable Oracle docker integration tests > -- > > Key: SPARK-14504 > URL: https://issues.apache.org/jira/browse/SPARK-14504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Luciano Resende >Assignee: Luciano Resende >Priority: Minor > Fix For: 2.0.0 > > > Enable Oracle docker integration tests -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14674) Move HiveContext.hiveconf to HiveSessionState
[ https://issues.apache.org/jira/browse/SPARK-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-14674. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12449 [https://github.com/apache/spark/pull/12449] > Move HiveContext.hiveconf to HiveSessionState > - > > Key: SPARK-14674 > URL: https://issues.apache.org/jira/browse/SPARK-14674 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > Just a minor cleanup. This allows us to remove HiveContext later without > inflating the diff too much. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14710) Rename gen/genCode to genCode/doGenCode to better reflect the semantics
[ https://issues.apache.org/jira/browse/SPARK-14710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-14710. - Resolution: Fixed Assignee: Sameer Agarwal Fix Version/s: 2.0.0 > Rename gen/genCode to genCode/doGenCode to better reflect the semantics > --- > > Key: SPARK-14710 > URL: https://issues.apache.org/jira/browse/SPARK-14710 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Sameer Agarwal >Assignee: Sameer Agarwal > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14489) RegressionEvaluator returns NaN for ALS in Spark ml
[ https://issues.apache.org/jira/browse/SPARK-14489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246538#comment-15246538 ] Joseph K. Bradley commented on SPARK-14489: --- I agree that it's unclear what to do with a new item. I don't think there are any good options and would support either not tolerating or ignoring new items. > RegressionEvaluator returns NaN for ALS in Spark ml > --- > > Key: SPARK-14489 > URL: https://issues.apache.org/jira/browse/SPARK-14489 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.0 > Environment: AWS EMR >Reporter: Boris Clémençon > Labels: patch > Original Estimate: 4h > Remaining Estimate: 4h > > When building a Spark ML pipeline containing an ALS estimator, the metrics > "rmse", "mse", "r2" and "mae" all return NaN. > The reason is in CrossValidator.scala line 109. The K-folds are randomly > generated. For large and sparse datasets, there is a significant probability > that at least one user of the validation set is missing in the training set, > hence generating a few NaN estimation with transform method and NaN > RegressionEvaluator's metrics too. > Suggestion to fix the bug: remove the NaN values while computing the rmse or > other metrics (ie, removing users or items in validation test that is missing > in the learning set). Send logs when this happen. > Issue SPARK-14153 seems to be the same pbm > {code:title=Bar.scala|borderStyle=solid} > val splits = MLUtils.kFold(dataset.rdd, $(numFolds), 0) > splits.zipWithIndex.foreach { case ((training, validation), splitIndex) => > val trainingDataset = sqlCtx.createDataFrame(training, schema).cache() > val validationDataset = sqlCtx.createDataFrame(validation, > schema).cache() > // multi-model training > logDebug(s"Train split $splitIndex with multiple sets of parameters.") > val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]] > trainingDataset.unpersist() > var i = 0 > while (i < numModels) { > // TODO: duplicate evaluator to take extra params from input > val metric = eval.evaluate(models(i).transform(validationDataset, > epm(i))) > logDebug(s"Got metric $metric for model trained with ${epm(i)}.") > metrics(i) += metric > i += 1 > } > validationDataset.unpersist() > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7264) SparkR API for parallel functions
[ https://issues.apache.org/jira/browse/SPARK-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7264: - Target Version/s: 2.0.0 > SparkR API for parallel functions > - > > Key: SPARK-7264 > URL: https://issues.apache.org/jira/browse/SPARK-7264 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Timothy Hunter > > This is a JIRA to discuss design proposals for enabling parallel R > computation in SparkR without exposing the entire RDD API. > The rationale for this is that the RDD API has a number of low level > functions and we would like to expose a more light-weight API that is both > friendly to R users and easy to maintain. > http://goo.gl/GLHKZI has a first cut design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-7264) SparkR API for parallel functions
[ https://issues.apache.org/jira/browse/SPARK-7264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-7264: - Assignee: Timothy Hunter > SparkR API for parallel functions > - > > Key: SPARK-7264 > URL: https://issues.apache.org/jira/browse/SPARK-7264 > Project: Spark > Issue Type: New Feature > Components: SparkR >Reporter: Shivaram Venkataraman >Assignee: Timothy Hunter > > This is a JIRA to discuss design proposals for enabling parallel R > computation in SparkR without exposing the entire RDD API. > The rationale for this is that the RDD API has a number of low level > functions and we would like to expose a more light-weight API that is both > friendly to R users and easy to maintain. > http://goo.gl/GLHKZI has a first cut design doc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14604) Modify design of ML model summaries
[ https://issues.apache.org/jira/browse/SPARK-14604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246489#comment-15246489 ] Gayathri Murali commented on SPARK-14604: - [~josephkb] I see that LogisticRegression has a evaluate method. Would you like to add a similar one to LinearRegressionModel and GLM? Also LogisticRegression Summary does not store model while Linear and GLM does. > Modify design of ML model summaries > --- > > Key: SPARK-14604 > URL: https://issues.apache.org/jira/browse/SPARK-14604 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley > > Several spark.ml models now have summaries containing evaluation metrics and > training info: > * LinearRegressionModel > * LogisticRegressionModel > * GeneralizedLinearRegressionModel > These summaries have unfortunately been added in an inconsistent way. I > propose to reorganize them to have: > * For each model, 1 summary (without training info) and 1 training summary > (with info from training). The non-training summary can be produced for a > new dataset via {{evaluate}}. > * A summary should not store the model itself. > * A summary should provide a transient reference to the dataset used to > produce the summary. > This task will involve reorganizing the GLM summary (which lacks a > training/non-training distinction) and deprecating the model method in the > LinearRegressionSummary. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14299) Scala ML examples code merge and clean up
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-14299: -- Assignee: Xusen Yin > Scala ML examples code merge and clean up > - > > Key: SPARK-14299 > URL: https://issues.apache.org/jira/browse/SPARK-14299 > Project: Spark > Issue Type: Sub-task > Components: Examples >Reporter: Xusen Yin >Assignee: Xusen Yin >Priority: Minor > Labels: starter > Fix For: 2.0.0 > > > Duplicated code that I found in scala/examples/ml: > * scala/ml > ** CrossValidatorExample.scala --> ModelSelectionViaCrossValidationExample > ** TrainValidationSplitExample.scala --> > ModelSelectionViaTrainValidationSplitExample > ** DeveloperApiExample.scala --> I delete it for now because it's only about > how to create your own classifieri, etc, which can be learned easily from > other examples and ml codes. > ** SimpleParamsExample.scala --> merge with > LogisticRegressionSummaryExample.scala > ** SimpleTextClassificationPipeline.scala --> > ModelSelectionViaCrossValidationExample > ** DataFrameExample.scala --> merge with > LogisticRegressionSummaryExample.scala > * Intend to reserve with command-line support: > ** DecisionTreeExample.scala --> DecisionTreeRegressionExample, > DecisionTreeClassificationExample > ** GBTExample.scala --> GradientBoostedTreeClassifierExample, > GradientBoostedTreeRegressorExample > ** LinearRegressionExample.scala --> LinearRegressionWithElasticNetExample > ** LogisticRegressionExample.scala --> > LogisticRegressionWithElasticNetExample, LogisticRegressionSummaryExample > ** RandomForestExample.scala --> RandomForestRegressorExample, > RandomForestClassifierExample > When merging and cleaning those code, be sure not disturb the previous > example on and off blocks. > I'll take this one as an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14299) Scala ML examples code merge and clean up
[ https://issues.apache.org/jira/browse/SPARK-14299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-14299. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12366 [https://github.com/apache/spark/pull/12366] > Scala ML examples code merge and clean up > - > > Key: SPARK-14299 > URL: https://issues.apache.org/jira/browse/SPARK-14299 > Project: Spark > Issue Type: Sub-task > Components: Examples >Reporter: Xusen Yin >Priority: Minor > Labels: starter > Fix For: 2.0.0 > > > Duplicated code that I found in scala/examples/ml: > * scala/ml > ** CrossValidatorExample.scala --> ModelSelectionViaCrossValidationExample > ** TrainValidationSplitExample.scala --> > ModelSelectionViaTrainValidationSplitExample > ** DeveloperApiExample.scala --> I delete it for now because it's only about > how to create your own classifieri, etc, which can be learned easily from > other examples and ml codes. > ** SimpleParamsExample.scala --> merge with > LogisticRegressionSummaryExample.scala > ** SimpleTextClassificationPipeline.scala --> > ModelSelectionViaCrossValidationExample > ** DataFrameExample.scala --> merge with > LogisticRegressionSummaryExample.scala > * Intend to reserve with command-line support: > ** DecisionTreeExample.scala --> DecisionTreeRegressionExample, > DecisionTreeClassificationExample > ** GBTExample.scala --> GradientBoostedTreeClassifierExample, > GradientBoostedTreeRegressorExample > ** LinearRegressionExample.scala --> LinearRegressionWithElasticNetExample > ** LogisticRegressionExample.scala --> > LogisticRegressionWithElasticNetExample, LogisticRegressionSummaryExample > ** RandomForestExample.scala --> RandomForestRegressorExample, > RandomForestClassifierExample > When merging and cleaning those code, be sure not disturb the previous > example on and off blocks. > I'll take this one as an example. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-14440: -- Assignee: Xusen Yin > Remove PySpark ml.pipeline's specific Reader and Writer > --- > > Key: SPARK-14440 > URL: https://issues.apache.org/jira/browse/SPARK-14440 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Xusen Yin >Assignee: Xusen Yin >Priority: Trivial > Fix For: 2.0.0 > > > Since the > PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader > are just extended from JavaMLWriter and JavaMLReader without other > modifications of attributes and methods, there is no need to keep them, just > like what we did in the save/load of ml/tuning.py. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14440) Remove PySpark ml.pipeline's specific Reader and Writer
[ https://issues.apache.org/jira/browse/SPARK-14440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-14440. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 12216 [https://github.com/apache/spark/pull/12216] > Remove PySpark ml.pipeline's specific Reader and Writer > --- > > Key: SPARK-14440 > URL: https://issues.apache.org/jira/browse/SPARK-14440 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Reporter: Xusen Yin >Priority: Trivial > Fix For: 2.0.0 > > > Since the > PipelineMLWriter/PipelineMLReader/PipelineModelMLWriter/PipelineModelMLReader > are just extended from JavaMLWriter and JavaMLReader without other > modifications of attributes and methods, there is no need to keep them, just > like what we did in the save/load of ml/tuning.py. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-14715) Provide a way to mask partitions of a Dataset/Dataframe
Anderson de Andrade created SPARK-14715: --- Summary: Provide a way to mask partitions of a Dataset/Dataframe Key: SPARK-14715 URL: https://issues.apache.org/jira/browse/SPARK-14715 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.1.0 Reporter: Anderson de Andrade If a Dataset/Dataframe were to have a custom partitioning by key(s), it would be very efficient to just mask partitions when filtering by the same key(s). This feature is already provide by PartitionPruningRDD on RDDs. We need something similar on the Dataset/Dataframe space. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-14647) Group SQLContext/HiveContext state into PersistentState
[ https://issues.apache.org/jira/browse/SPARK-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-14647. -- Resolution: Fixed Issue resolved by pull request 12463 [https://github.com/apache/spark/pull/12463] > Group SQLContext/HiveContext state into PersistentState > --- > > Key: SPARK-14647 > URL: https://issues.apache.org/jira/browse/SPARK-14647 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.0.0 >Reporter: Andrew Or >Assignee: Andrew Or > Fix For: 2.0.0 > > > This is analogous to SPARK-13526, which moved some things into > `SessionState`. After this issue we'll have an analogous `PersistentState` > that groups things to be shared across sessions. This will simplify the > constructors of the contexts significantly by allowing us to pass fewer > things into the contexts. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases
[ https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14714: Assignee: Apache Spark (was: Joseph K. Bradley) > PySpark Param TypeConverter arg is not passed by name in some cases > --- > > Key: SPARK-14714 > URL: https://issues.apache.org/jira/browse/SPARK-14714 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Apache Spark >Priority: Minor > > PySpark Param constructors need to pass the TypeConverter argument by name, > partly to make sure it is not mistaken for the expectedType arg and partly > because we will remove the expectedType arg in 2.1. In several places, this > is not being done correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases
[ https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14714: Assignee: Joseph K. Bradley (was: Apache Spark) > PySpark Param TypeConverter arg is not passed by name in some cases > --- > > Key: SPARK-14714 > URL: https://issues.apache.org/jira/browse/SPARK-14714 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > > PySpark Param constructors need to pass the TypeConverter argument by name, > partly to make sure it is not mistaken for the expectedType arg and partly > because we will remove the expectedType arg in 2.1. In several places, this > is not being done correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14714) PySpark Param TypeConverter arg is not passed by name in some cases
[ https://issues.apache.org/jira/browse/SPARK-14714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15246443#comment-15246443 ] Apache Spark commented on SPARK-14714: -- User 'jkbradley' has created a pull request for this issue: https://github.com/apache/spark/pull/12480 > PySpark Param TypeConverter arg is not passed by name in some cases > --- > > Key: SPARK-14714 > URL: https://issues.apache.org/jira/browse/SPARK-14714 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 2.0.0 >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley >Priority: Minor > > PySpark Param constructors need to pass the TypeConverter argument by name, > partly to make sure it is not mistaken for the expectedType arg and partly > because we will remove the expectedType arg in 2.1. In several places, this > is not being done correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org