[jira] [Commented] (SPARK-20133) User guide for spark.ml.stat.ChiSquareTest
[ https://issues.apache.org/jira/browse/SPARK-20133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15949418#comment-15949418 ] Benjamin Fradet commented on SPARK-20133: - Can I take this one? > User guide for spark.ml.stat.ChiSquareTest > -- > > Key: SPARK-20133 > URL: https://issues.apache.org/jira/browse/SPARK-20133 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Affects Versions: 2.2.0 >Reporter: Joseph K. Bradley >Priority: Minor > > Add new user guide section for spark.ml.stat, and document ChiSquareTest. > This may involve adding new example scripts. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20097) Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR
Benjamin Fradet created SPARK-20097: --- Summary: Fix visibility discrepancy with numInstances and degreesOfFreedom in LR and GLR Key: SPARK-20097 URL: https://issues.apache.org/jira/browse/SPARK-20097 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.1.0 Reporter: Benjamin Fradet Priority: Trivial - numInstances is public in lr and regression private in glr - degreesOfFreedom is private in lr and public in glr -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16857) CrossValidator and KMeans throws IllegalArgumentException
[ https://issues.apache.org/jira/browse/SPARK-16857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611926#comment-15611926 ] Benjamin Fradet commented on SPARK-16857: - I was wondering why a KMeansEvalutor computing the wsse hasn't been implemented yet. Any ideas why not? > CrossValidator and KMeans throws IllegalArgumentException > - > > Key: SPARK-16857 > URL: https://issues.apache.org/jira/browse/SPARK-16857 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 1.6.1 > Environment: spark-jobserver docker image. Spark 1.6.1 on ubuntu, > Hadoop 2.4 >Reporter: Ryan Claussen > > I am attempting to use CrossValidation to train KMeans model. When I attempt > to fit the data spark throws an IllegalArgumentException as below since the > KMeans algorithm outputs an Integer into the prediction column instead of a > Double. Before I go too far: is using CrossValidation with Kmeans > supported? > Here's the exception: > {quote} > java.lang.IllegalArgumentException: requirement failed: Column prediction > must be of type DoubleType but was actually IntegerType. > at scala.Predef$.require(Predef.scala:233) > at > org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42) > at > org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator.evaluate(MulticlassClassificationEvaluator.scala:74) > at > org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:109) > at > org.apache.spark.ml.tuning.CrossValidator$$anonfun$fit$1.apply(CrossValidator.scala:99) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) > at org.apache.spark.ml.tuning.CrossValidator.fit(CrossValidator.scala:99) > at > com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.generateKMeans(SparkModelJob.scala:202) > at > com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.runJob(SparkModelJob.scala:62) > at > com.ibm.bpm.cloud.ci.cto.prediction.SparkModelJob$.runJob(SparkModelJob.scala:39) > at > spark.jobserver.JobManagerActor$$anonfun$spark$jobserver$JobManagerActor$$getJobFuture$4.apply(JobManagerActor.scala:301) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {quote} > Here is the code I'm using to set up my cross validator. As the stack trace > above indicates it is failing at the fit step when > {quote} > ... > val mpc = new KMeans().setK(2).setFeaturesCol("indexedFeatures") > val labelConverter = new > IndexToString().setInputCol("prediction").setOutputCol("predictedLabel").setLabels(labelIndexer.labels) > val pipeline = new Pipeline().setStages(Array(labelIndexer, > featureIndexer, mpc, labelConverter)) > val evaluator = new > MulticlassClassificationEvaluator().setLabelCol("approvedIndex").setPredictionCol("prediction") > val paramGrid = new ParamGridBuilder().addGrid(mpc.maxIter, Array(100, > 200, 500)).build() > val cv = new > CrossValidator().setEstimator(pipeline).setEvaluator(evaluator).setEstimatorParamMaps(paramGrid).setNumFolds(3) > val cvModel = cv.fit(trainingData) > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15581) MLlib 2.1 Roadmap
[ https://issues.apache.org/jira/browse/SPARK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305275#comment-15305275 ] Benjamin Fradet commented on SPARK-15581: - Thanks, we should maybe add it to the roadmap, don't you think? > MLlib 2.1 Roadmap > - > > Key: SPARK-15581 > URL: https://issues.apache.org/jira/browse/SPARK-15581 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib >Reporter: Joseph K. Bradley >Priority: Blocker > Labels: roadmap > > This is a master list for MLlib improvements we are working on for the next > release. Please view this as a wish list rather than a definite plan, for we > don't have an accurate estimate of available resources. Due to limited review > bandwidth, features appearing on this list will get higher priority during > code review. But feel free to suggest new items to the list in comments. We > are experimenting with this process. Your feedback would be greatly > appreciated. > h1. Instructions > h2. For contributors: > * Please read > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > carefully. Code style, documentation, and unit tests are important. > * If you are a first-time Spark contributor, please always start with a > [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather > than a medium/big feature. Based on our experience, mixing the development > process with a big feature usually causes long delay in code review. > * Never work silently. Let everyone know on the corresponding JIRA page when > you start working on some features. This is to avoid duplicate work. For > small features, you don't need to wait to get JIRA assigned. > * For medium/big features or features with dependencies, please get assigned > first before coding and keep the ETA updated on the JIRA. If there exist no > activity on the JIRA page for a certain amount of time, the JIRA should be > released for other contributors. > * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one > after another. > * Remember to add the `@Since("VERSION")` annotation to new public APIs. > * Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code > review greatly helps to improve others' code as well as yours. > h2. For committers: > * Try to break down big features into small and specific JIRA tasks and link > them properly. > * Add a "starter" label to starter tasks. > * Put a rough estimate for medium/big features and track the progress. > * If you start reviewing a PR, please add yourself to the Shepherd field on > JIRA. > * If the code looks good to you, please comment "LGTM". For non-trivial PRs, > please ping a maintainer to make a final pass. > * After merging a PR, create and link JIRAs for Python, example code, and > documentation if applicable. > h1. Roadmap (*WIP*) > This is NOT [a complete list of MLlib JIRAs for 2.1| > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.1.0%20AND%20(fixVersion%20is%20EMPTY%20OR%20fixVersion%20!%3D%202.1.0)%20AND%20(Resolution%20is%20EMPTY%20OR%20Resolution%20in%20(Done%2C%20Fixed%2C%20Implemented))%20ORDER%20BY%20priority]. > We only include umbrella JIRAs and high-level tasks. > Major efforts in this release: > * Feature parity for the DataFrames-based API (`spark.ml`), relative to the > RDD-based API > * ML persistence > * Python API feature parity and test coverage > * R API expansion and improvements > * Note about new features: As usual, we expect to expand the feature set of > MLlib. However, we will prioritize API parity, bug fixes, and improvements > over new features. > Note `spark.mllib` is in maintenance mode now. We will accept bug fixes for > it, but new features, APIs, and improvements will only be added to `spark.ml`. > h2. Critical feature parity in DataFrame-based API > * Umbrella JIRA: [SPARK-4591] > h2. Persistence > * Complete persistence within MLlib > ** Python tuning (SPARK-13786) > * MLlib in R format: compatibility with other languages (SPARK-15572) > * Impose backwards compatibility for persistence (SPARK-15573) > h2. Python API > * Standardize unit tests for Scala and Python to improve and consolidate test > coverage for Params, persistence, and other common functionality (SPARK-15571) > * Improve Python API handling of Params, persistence (SPARK-14771) > (SPARK-14706) > ** Note: The linked JIRAs for this are incomplete. More to be created... > ** Related: Implement Python meta-algorithms in Scala (to simplify > persistence) (SPARK-15574) > * Feature parity: The main goal of the Python API is to have feature parity > with the Scala/Java API. You can find a [complete list here| >
[jira] [Commented] (SPARK-15581) MLlib 2.1 Roadmap
[ https://issues.apache.org/jira/browse/SPARK-15581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15304869#comment-15304869 ] Benjamin Fradet commented on SPARK-15581: - [~josephkb] Just out of curiosity: I don't see any mention of supporting multiclass classification for gbt or logistic regression, is this something that is still planned? > MLlib 2.1 Roadmap > - > > Key: SPARK-15581 > URL: https://issues.apache.org/jira/browse/SPARK-15581 > Project: Spark > Issue Type: Umbrella > Components: ML, MLlib >Reporter: Joseph K. Bradley >Priority: Blocker > Labels: roadmap > > This is a master list for MLlib improvements we are working on for the next > release. Please view this as a wish list rather than a definite plan, for we > don't have an accurate estimate of available resources. Due to limited review > bandwidth, features appearing on this list will get higher priority during > code review. But feel free to suggest new items to the list in comments. We > are experimenting with this process. Your feedback would be greatly > appreciated. > h1. Instructions > h2. For contributors: > * Please read > https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark > carefully. Code style, documentation, and unit tests are important. > * If you are a first-time Spark contributor, please always start with a > [starter task|https://issues.apache.org/jira/issues/?filter=12333209] rather > than a medium/big feature. Based on our experience, mixing the development > process with a big feature usually causes long delay in code review. > * Never work silently. Let everyone know on the corresponding JIRA page when > you start working on some features. This is to avoid duplicate work. For > small features, you don't need to wait to get JIRA assigned. > * For medium/big features or features with dependencies, please get assigned > first before coding and keep the ETA updated on the JIRA. If there exist no > activity on the JIRA page for a certain amount of time, the JIRA should be > released for other contributors. > * Do not claim multiple (>3) JIRAs at the same time. Try to finish them one > after another. > * Remember to add the `@Since("VERSION")` annotation to new public APIs. > * Please review others' PRs (https://spark-prs.appspot.com/#mllib). Code > review greatly helps to improve others' code as well as yours. > h2. For committers: > * Try to break down big features into small and specific JIRA tasks and link > them properly. > * Add a "starter" label to starter tasks. > * Put a rough estimate for medium/big features and track the progress. > * If you start reviewing a PR, please add yourself to the Shepherd field on > JIRA. > * If the code looks good to you, please comment "LGTM". For non-trivial PRs, > please ping a maintainer to make a final pass. > * After merging a PR, create and link JIRAs for Python, example code, and > documentation if applicable. > h1. Roadmap (*WIP*) > This is NOT [a complete list of MLlib JIRAs for 2.1| > https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20component%20in%20(ML%2C%20MLlib%2C%20SparkR%2C%20GraphX)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.1.0%20AND%20(fixVersion%20is%20EMPTY%20OR%20fixVersion%20!%3D%202.1.0)%20AND%20(Resolution%20is%20EMPTY%20OR%20Resolution%20in%20(Done%2C%20Fixed%2C%20Implemented))%20ORDER%20BY%20priority]. > We only include umbrella JIRAs and high-level tasks. > Major efforts in this release: > * Feature parity for the DataFrames-based API (`spark.ml`), relative to the > RDD-based API > * ML persistence > * Python API feature parity and test coverage > * R API expansion and improvements > * Note about new features: As usual, we expect to expand the feature set of > MLlib. However, we will prioritize API parity, bug fixes, and improvements > over new features. > Note `spark.mllib` is in maintenance mode now. We will accept bug fixes for > it, but new features, APIs, and improvements will only be added to `spark.ml`. > h2. Critical feature parity in DataFrame-based API > * Umbrella JIRA: [SPARK-4591] > h2. Persistence > * Complete persistence within MLlib > ** Python tuning (SPARK-13786) > * MLlib in R format: compatibility with other languages (SPARK-15572) > * Impose backwards compatibility for persistence (SPARK-15573) > h2. Python API > * Standardize unit tests for Scala and Python to improve and consolidate test > coverage for Params, persistence, and other common functionality (SPARK-15571) > * Improve Python API handling of Params, persistence (SPARK-14771) > (SPARK-14706) > ** Note: The linked JIRAs for this are incomplete. More to be created... > ** Related: Implement Python meta-algorithms in Scala (to simplify > persistence) (SPARK-15574) > * Feature parity: The main goal of the Python
[jira] [Commented] (SPARK-15200) Add documentaion and examples for GaussianMixture
[ https://issues.apache.org/jira/browse/SPARK-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275524#comment-15275524 ] Benjamin Fradet commented on SPARK-15200: - woops, didnt see it linked to 15101 > Add documentaion and examples for GaussianMixture > - > > Key: SPARK-15200 > URL: https://issues.apache.org/jira/browse/SPARK-15200 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Benjamin Fradet >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15200) Add documentaion and examples for GaussianMixture
[ https://issues.apache.org/jira/browse/SPARK-15200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275227#comment-15275227 ] Benjamin Fradet commented on SPARK-15200: - I've started working on this > Add documentaion and examples for GaussianMixture > - > > Key: SPARK-15200 > URL: https://issues.apache.org/jira/browse/SPARK-15200 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Benjamin Fradet >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15200) Add documentaion and examples for GaussianMixture
Benjamin Fradet created SPARK-15200: --- Summary: Add documentaion and examples for GaussianMixture Key: SPARK-15200 URL: https://issues.apache.org/jira/browse/SPARK-15200 Project: Spark Issue Type: Documentation Components: Documentation, ML Reporter: Benjamin Fradet Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14985) Update LinearRegression, LogisticRegression summary internals to handle model copy
[ https://issues.apache.org/jira/browse/SPARK-14985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15265261#comment-15265261 ] Benjamin Fradet commented on SPARK-14985: - I'll take this one if you guys don't mind. > Update LinearRegression, LogisticRegression summary internals to handle model > copy > -- > > Key: SPARK-14985 > URL: https://issues.apache.org/jira/browse/SPARK-14985 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > See parent JIRA + the PR for [SPARK-14852] for details. The summaries should > handle creating an internal copy of the model. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14817) ML 2.0 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-14817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254196#comment-15254196 ] Benjamin Fradet commented on SPARK-14817: - Count me in! > ML 2.0 QA: Programming guide update and migration guide > --- > > Key: SPARK-14817 > URL: https://issues.apache.org/jira/browse/SPARK-14817 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, MLlib >Reporter: Joseph K. Bradley > > Before the release, we need to update the MLlib Programming Guide. Updates > will include: > * Make the DataFrame-based API (spark.ml) front-and-center, to make it clear > the RDD-based API is the older, maintenance-mode one. > ** No docs for spark.mllib will be deleted; they will just be reorganized and > put in a subsection. > ** If spark.ml docs are less complete, or if spark.ml docs say "refer to the > spark.mllib docs for details," then we should copy those details to the > spark.ml docs. > * Add migration guide subsection. > ** Use the results of the QA audit JIRAs. > * Check phrasing, especially in main sections (for outdated items such as "In > this release, ...") > If you would like to work on this task, please comment, and we can create & > link JIRAs for parts of this work (which should be broken into pieces for > this larger 2.0 release). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14570) Log instrumentation in Random forests
[ https://issues.apache.org/jira/browse/SPARK-14570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250067#comment-15250067 ] Benjamin Fradet commented on SPARK-14570: - I'll take this one if you guys don't mind. > Log instrumentation in Random forests > - > > Key: SPARK-14570 > URL: https://issues.apache.org/jira/browse/SPARK-14570 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Timothy Hunter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-14730) Expose ColumnPruner as feature transformer
[ https://issues.apache.org/jira/browse/SPARK-14730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15250050#comment-15250050 ] Benjamin Fradet commented on SPARK-14730: - [~jlaskowski], [~yanboliang] are one of you guys working on this? > Expose ColumnPruner as feature transformer > -- > > Key: SPARK-14730 > URL: https://issues.apache.org/jira/browse/SPARK-14730 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Jacek Laskowski >Priority: Minor > > From d...@spark.apache.org: > {quote} > Jacek: > Came across `private class ColumnPruner` with "TODO(ekl) make this a > public transformer" in scaladoc, cf. > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/RFormula.scala#L317. > Why is this private and is there a JIRA for the TODO(ekl)? > {quote} > {quote} > Yanbo Liang: > This is due to ColumnPruner is only used for RFormula currently, we did not > expose it as a feature transformer. > Please feel free to create JIRA and work on it. > {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12983) Correct metrics.properties.template
Benjamin Fradet created SPARK-12983: --- Summary: Correct metrics.properties.template Key: SPARK-12983 URL: https://issues.apache.org/jira/browse/SPARK-12983 Project: Spark Issue Type: Documentation Components: Documentation, Spark Core Reporter: Benjamin Fradet Priority: Minor There are some typos or plain unintelligible sentences in the metrics template. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-12858) Remove duplicated code in metrics
[ https://issues.apache.org/jira/browse/SPARK-12858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fradet closed SPARK-12858. --- Resolution: Not A Problem > Remove duplicated code in metrics > - > > Key: SPARK-12858 > URL: https://issues.apache.org/jira/browse/SPARK-12858 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Reporter: Benjamin Fradet >Priority: Minor > > I noticed there is some duplicated code in the sinks regarding the poll > period. > Also, parts of the metrics.properties template are unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12858) Remove duplicated code in metrics
Benjamin Fradet created SPARK-12858: --- Summary: Remove duplicated code in metrics Key: SPARK-12858 URL: https://issues.apache.org/jira/browse/SPARK-12858 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: Benjamin Fradet Priority: Minor I noticed there is some duplicated code in the sinks regarding the poll period. Also, parts of the metrics.properties template are unclear. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9716) BinaryClassificationEvaluator should accept Double prediction column
[ https://issues.apache.org/jira/browse/SPARK-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070992#comment-15070992 ] Benjamin Fradet commented on SPARK-9716: Somewhat related, I think `RegressionEvaluator` should accept all numeric type as prediction column. Because, for example, in the case of als which produces float predictions, we need to convert those to double before we're able to use the `RegressionEvaluator`. > BinaryClassificationEvaluator should accept Double prediction column > > > Key: SPARK-9716 > URL: https://issues.apache.org/jira/browse/SPARK-9716 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > BinaryClassificationEvaluator currently expects the rawPrediction column, of > type Vector. It should also accept a Double prediction column, with a > different set of supported metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9716) BinaryClassificationEvaluator should accept Double prediction column
[ https://issues.apache.org/jira/browse/SPARK-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070992#comment-15070992 ] Benjamin Fradet edited comment on SPARK-9716 at 12/24/15 1:19 PM: -- Somewhat related, I think `RegressionEvaluator` should accept all numeric type as prediction column. Because, for example, in the case of als which produces float predictions, we need to convert those to double before we're able to use the `RegressionEvaluator`. Conversely, we could make als make double predictions in order to keep things consistent across Estimators. was (Author: benfradet): Somewhat related, I think `RegressionEvaluator` should accept all numeric type as prediction column. Because, for example, in the case of als which produces float predictions, we need to convert those to double before we're able to use the `RegressionEvaluator`. > BinaryClassificationEvaluator should accept Double prediction column > > > Key: SPARK-9716 > URL: https://issues.apache.org/jira/browse/SPARK-9716 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > BinaryClassificationEvaluator currently expects the rawPrediction column, of > type Vector. It should also accept a Double prediction column, with a > different set of supported metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071062#comment-15071062 ] Benjamin Fradet commented on SPARK-12247: - The [PR|https://github.com/apache/spark/pull/10411] has been out for a few days. > Documentation for spark.ml's ALS and collaborative filtering in general > --- > > Key: SPARK-12247 > URL: https://issues.apache.org/jira/browse/SPARK-12247 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > > We need to add a section in the documentation about collaborative filtering > in the dataframe API: > - copy explanations about collaborative filtering and ALS from spark.mllib > - provide an example with spark.ml's ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069366#comment-15069366 ] Benjamin Fradet commented on SPARK-12247: - Yup, I was thinking of keeping only the rmse calculation too. We could also just compute the rmse using the regression evaluator instead of doing it "manually", what do you think? > Documentation for spark.ml's ALS and collaborative filtering in general > --- > > Key: SPARK-12247 > URL: https://issues.apache.org/jira/browse/SPARK-12247 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > > We need to add a section in the documentation about collaborative filtering > in the dataframe API: > - copy explanations about collaborative filtering and ALS from spark.mllib > - provide an example with spark.ml's ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067726#comment-15067726 ] Benjamin Fradet commented on SPARK-12247: - [~thunterdb] Do you think I should also include the calculation on of false positives as in [the movie lens example|https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/MovieLensALS.scala#L167]? > Documentation for spark.ml's ALS and collaborative filtering in general > --- > > Key: SPARK-12247 > URL: https://issues.apache.org/jira/browse/SPARK-12247 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > > We need to add a section in the documentation about collaborative filtering > in the dataframe API: > - copy explanations about collaborative filtering and ALS from spark.mllib > - provide an example with spark.ml's ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067021#comment-15067021 ] Benjamin Fradet commented on SPARK-12247: - Ok thanks, I'll rework the examples accordingly. > Documentation for spark.ml's ALS and collaborative filtering in general > --- > > Key: SPARK-12247 > URL: https://issues.apache.org/jira/browse/SPARK-12247 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > > We need to add a section in the documentation about collaborative filtering > in the dataframe API: > - copy explanations about collaborative filtering and ALS from spark.mllib > - provide an example with spark.ml's ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065417#comment-15065417 ] Benjamin Fradet commented on SPARK-12247: - By the way, should we repurpose [MovieLensALS|https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/ml/MovieLensALS.scala] or keep it alongside the documentation example? > Documentation for spark.ml's ALS and collaborative filtering in general > --- > > Key: SPARK-12247 > URL: https://issues.apache.org/jira/browse/SPARK-12247 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > > We need to add a section in the documentation about collaborative filtering > in the dataframe API: > - copy explanations about collaborative filtering and ALS from spark.mllib > - provide an example with spark.ml's ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12247) Documentation for spark.ml's ALS and collaborative filtering in general
[ https://issues.apache.org/jira/browse/SPARK-12247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065344#comment-15065344 ] Benjamin Fradet commented on SPARK-12247: - I've started working on this. > Documentation for spark.ml's ALS and collaborative filtering in general > --- > > Key: SPARK-12247 > URL: https://issues.apache.org/jira/browse/SPARK-12247 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib >Affects Versions: 1.5.2 >Reporter: Timothy Hunter > > We need to add a section in the documentation about collaborative filtering > in the dataframe API: > - copy explanations about collaborative filtering and ALS from spark.mllib > - provide an example with spark.ml's ALS -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9716) BinaryClassificationEvaluator should accept Double prediction column
[ https://issues.apache.org/jira/browse/SPARK-9716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15065342#comment-15065342 ] Benjamin Fradet commented on SPARK-9716: [~lkhamsurenl] Are you working on it or can I take over? > BinaryClassificationEvaluator should accept Double prediction column > > > Key: SPARK-9716 > URL: https://issues.apache.org/jira/browse/SPARK-9716 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > BinaryClassificationEvaluator currently expects the rawPrediction column, of > type Vector. It should also accept a Double prediction column, with a > different set of supported metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12368) Better doc for the binary classification evaluator setMetricName method
Benjamin Fradet created SPARK-12368: --- Summary: Better doc for the binary classification evaluator setMetricName method Key: SPARK-12368 URL: https://issues.apache.org/jira/browse/SPARK-12368 Project: Spark Issue Type: Improvement Components: Documentation, ML Reporter: Benjamin Fradet Priority: Minor For the BinaryClassificationEvaluator, the scaladoc doesn't mention that "areaUnderPR" is supported, only that the default is "areadUnderROC". Also, in the documentation, it is said that: "The default metric used to choose the best ParamMap can be overriden by the setMetric method in each of these evaluators." However, the method is called setMetricName. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12368) Better doc for the binary classification evaluator setMetricName method
[ https://issues.apache.org/jira/browse/SPARK-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060167#comment-15060167 ] Benjamin Fradet commented on SPARK-12368: - I've started working on this. > Better doc for the binary classification evaluator setMetricName method > --- > > Key: SPARK-12368 > URL: https://issues.apache.org/jira/browse/SPARK-12368 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML >Reporter: Benjamin Fradet >Priority: Minor > > For the BinaryClassificationEvaluator, the scaladoc doesn't mention that > "areaUnderPR" is supported, only that the default is "areadUnderROC". > Also, in the documentation, it is said that: > "The default metric used to choose the best ParamMap can be overriden by the > setMetric method in each of these evaluators." > However, the method is called setMetricName. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12368) Better doc for the binary classification evaluator' metricName
[ https://issues.apache.org/jira/browse/SPARK-12368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fradet updated SPARK-12368: Summary: Better doc for the binary classification evaluator' metricName (was: Better doc for the binary classification evaluator setMetricName method) > Better doc for the binary classification evaluator' metricName > -- > > Key: SPARK-12368 > URL: https://issues.apache.org/jira/browse/SPARK-12368 > Project: Spark > Issue Type: Improvement > Components: Documentation, ML >Reporter: Benjamin Fradet >Priority: Minor > > For the BinaryClassificationEvaluator, the scaladoc doesn't mention that > "areaUnderPR" is supported, only that the default is "areadUnderROC". > Also, in the documentation, it is said that: > "The default metric used to choose the best ParamMap can be overriden by the > setMetric method in each of these evaluators." > However, the method is called setMetricName. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7425) spark.ml Predictor should support other numeric types for label
[ https://issues.apache.org/jira/browse/SPARK-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054664#comment-15054664 ] Benjamin Fradet commented on SPARK-7425: Is there anyone working on this? Because I'm considering taking over this jira. I started writing some unit tests for a few predictors and I'm wondering if I should write unit tests for all the predictors? Input welcome. > spark.ml Predictor should support other numeric types for label > --- > > Key: SPARK-7425 > URL: https://issues.apache.org/jira/browse/SPARK-7425 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > Labels: starter > > Currently, the Predictor abstraction expects the input labelCol type to be > DoubleType, but we should support other numeric types. This will involve > updating the PredictorParams.validateAndTransformSchema method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12217) Document invalid handling for StringIndexer
[ https://issues.apache.org/jira/browse/SPARK-12217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051698#comment-15051698 ] Benjamin Fradet commented on SPARK-12217: - Sorry [~srowen], my bad, I wanted to duplicate the values on a previous jira but didnt know the implications. > Document invalid handling for StringIndexer > --- > > Key: SPARK-12217 > URL: https://issues.apache.org/jira/browse/SPARK-12217 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Benjamin Fradet >Priority: Minor > > Documentation is needed regarding the handling of invalid labels in > StringIndexer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Python Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049116#comment-15049116 ] Benjamin Fradet commented on SPARK-9059: There is a python code snipped like the java and scala ones in the docs on master [here|https://github.com/apache/spark/blob/master/docs/streaming-kafka-integration.md#approach-2-direct-approach-no-receivers]. However, my understanding was that this wasn't the point of this jira. As I understood it, it was originally to incorporate in the code examples, or duplicate into a new example, the use of `HasOffsetRanges` like the [scala one|https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala]. > Update Python Direct Kafka Word count examples to show the use of > HasOffsetRanges > - > > Key: SPARK-9059 > URL: https://issues.apache.org/jira/browse/SPARK-9059 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das >Priority: Minor > Labels: starter > > Update Python examples of Direct Kafka word count to access the offset ranges > using HasOffsetRanges and print it. For example in Scala, > > {code} > var offsetRanges: Array[OffsetRange] = _ > ... > directKafkaDStream.foreachRDD { rdd => > offsetRanges = rdd.asInstanceOf[HasOffsetRanges] > } > ... > transformedDStream.foreachRDD { rdd => > // some operation > println("Processed ranges: " + offsetRanges) > } > {code} > See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for > more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9059) Update Python Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049116#comment-15049116 ] Benjamin Fradet edited comment on SPARK-9059 at 12/10/15 6:49 AM: -- There is a python code snipped like the java and scala ones in the docs on master [here|https://github.com/apache/spark/blob/master/docs/streaming-kafka-integration.md#approach-2-direct-approach-no-receivers]. However, my understanding was that this wasn't the point of this jira. As I understood it, it was originally to incorporate in the code examples, or duplicate into a new example, the use of `HasOffsetRanges`. was (Author: benfradet): There is a python code snipped like the java and scala ones in the docs on master [here|https://github.com/apache/spark/blob/master/docs/streaming-kafka-integration.md#approach-2-direct-approach-no-receivers]. However, my understanding was that this wasn't the point of this jira. As I understood it, it was originally to incorporate in the code examples, or duplicate into a new example, the use of `HasOffsetRanges` like the [scala one|https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala]. > Update Python Direct Kafka Word count examples to show the use of > HasOffsetRanges > - > > Key: SPARK-9059 > URL: https://issues.apache.org/jira/browse/SPARK-9059 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das >Priority: Minor > Labels: starter > > Update Python examples of Direct Kafka word count to access the offset ranges > using HasOffsetRanges and print it. For example in Scala, > > {code} > var offsetRanges: Array[OffsetRange] = _ > ... > directKafkaDStream.foreachRDD { rdd => > offsetRanges = rdd.asInstanceOf[HasOffsetRanges] > } > ... > transformedDStream.foreachRDD { rdd => > // some operation > println("Processed ranges: " + offsetRanges) > } > {code} > See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for > more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Python Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048177#comment-15048177 ] Benjamin Fradet commented on SPARK-9059: Hi [~neelesh77], I know the documentation has been updated and I don't see any use of `HasOffsetRanges` in the Scala or Java examples. Pinging [~tdas], to get more information. > Update Python Direct Kafka Word count examples to show the use of > HasOffsetRanges > - > > Key: SPARK-9059 > URL: https://issues.apache.org/jira/browse/SPARK-9059 > Project: Spark > Issue Type: Sub-task > Components: Streaming >Reporter: Tathagata Das >Priority: Minor > Labels: starter > > Update Python examples of Direct Kafka word count to access the offset ranges > using HasOffsetRanges and print it. For example in Scala, > > {code} > var offsetRanges: Array[OffsetRange] = _ > ... > directKafkaDStream.foreachRDD { rdd => > offsetRanges = rdd.asInstanceOf[HasOffsetRanges] > } > ... > transformedDStream.foreachRDD { rdd => > // some operation > println("Processed ranges: " + offsetRanges) > } > {code} > See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for > more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-12217) Document invalid handling for StringIndexer
Benjamin Fradet created SPARK-12217: --- Summary: Document invalid handling for StringIndexer Key: SPARK-12217 URL: https://issues.apache.org/jira/browse/SPARK-12217 Project: Spark Issue Type: Documentation Components: ML Reporter: Benjamin Fradet Priority: Minor Fix For: 1.6.1, 2.0.0 Documentation is needed regarding the handling of invalid labels in StringIndexer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12217) Document invalid handling for StringIndexer
[ https://issues.apache.org/jira/browse/SPARK-12217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047545#comment-15047545 ] Benjamin Fradet commented on SPARK-12217: - I've started working on this. > Document invalid handling for StringIndexer > --- > > Key: SPARK-12217 > URL: https://issues.apache.org/jira/browse/SPARK-12217 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Benjamin Fradet >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > Documentation is needed regarding the handling of invalid labels in > StringIndexer -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12159) Add user guide section for IndexToString transformer
[ https://issues.apache.org/jira/browse/SPARK-12159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15043605#comment-15043605 ] Benjamin Fradet commented on SPARK-12159: - I've started working on this. > Add user guide section for IndexToString transformer > > > Key: SPARK-12159 > URL: https://issues.apache.org/jira/browse/SPARK-12159 > Project: Spark > Issue Type: Documentation > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > Add a user guide section for the IndexToString transformer as reported on the > mailing list ( > https://www.mail-archive.com/dev@spark.apache.org/msg12263.html ) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11902) Unhandled case in VectorAssembler#transform
Benjamin Fradet created SPARK-11902: --- Summary: Unhandled case in VectorAssembler#transform Key: SPARK-11902 URL: https://issues.apache.org/jira/browse/SPARK-11902 Project: Spark Issue Type: Bug Components: ML Affects Versions: 1.5.2 Reporter: Benjamin Fradet Priority: Minor I noticed that there is an unhandled case in the transform method of VectorAssembler if one of the input columns doesn't have one of the supported type DoubleType, NumericType, BooleanType or VectorUDT. So, if you try to transform a column of StringType you get a cryptic "scala.MatchError: StringType". Will submit a PR shortly -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9002) KryoSerializer initialization does not include 'Array[Int]'
[ https://issues.apache.org/jira/browse/SPARK-9002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636832#comment-14636832 ] Benjamin Fradet commented on SPARK-9002: [~rake] are you planning on opening a PR? KryoSerializer initialization does not include 'Array[Int]' --- Key: SPARK-9002 URL: https://issues.apache.org/jira/browse/SPARK-9002 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.4.0 Environment: MacBook Pro, OS X 10.10.4, Spark 1.4.0, master=local[*], IntelliJ IDEA. Reporter: Randy Kerber Priority: Minor Labels: easyfix, newbie Original Estimate: 1h Remaining Estimate: 1h The object KryoSerializer (inside KryoRegistrator.scala) contains a list of classes that are automatically registered with Kryo. That list includes: Array\[Byte], Array\[Long], and Array\[Short]. Array\[Int] is missing from that list. Can't think of any good reason it shouldn't also be included. Note: This is first time creating an issue or contributing code to an apache project. Apologies if I'm not following the process correct. Appreciate any guidance or assistance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9057) Add Scala, Java and Python example to show DStream.transform
[ https://issues.apache.org/jira/browse/SPARK-9057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635945#comment-14635945 ] Benjamin Fradet commented on SPARK-9057: One thing that would be interesting as well would be to demonstrate a restart from checkpoint resilient use of an accumulator or a broadcast variable as detailed in [SPARK-5206|https://issues.apache.org/jira/browse/SPARK-5206] Add Scala, Java and Python example to show DStream.transform Key: SPARK-9057 URL: https://issues.apache.org/jira/browse/SPARK-9057 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Labels: starter Currently there is no example to show the use of transform. Would be good to add an example, that uses transform to join a static RDD with the RDDs of a DStream. Need to be done for all 3 supported languages. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Python Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635941#comment-14635941 ] Benjamin Fradet commented on SPARK-9059: I have a version with the updated doc regarding python, I don't know if I should wait for the PR to be closed before opening mine or not. Update Python Direct Kafka Word count examples to show the use of HasOffsetRanges - Key: SPARK-9059 URL: https://issues.apache.org/jira/browse/SPARK-9059 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Priority: Minor Labels: starter Update Python examples of Direct Kafka word count to access the offset ranges using HasOffsetRanges and print it. For example in Scala, {code} var offsetRanges: Array[OffsetRange] = _ ... directKafkaDStream.foreachRDD { rdd = offsetRanges = rdd.asInstanceOf[HasOffsetRanges] } ... transformedDStream.foreachRDD { rdd = // some operation println(Processed ranges: + offsetRanges) } {code} See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630984#comment-14630984 ] Benjamin Fradet commented on SPARK-9059: Agreed. Update Direct Kafka Word count examples to show the use of HasOffsetRanges -- Key: SPARK-9059 URL: https://issues.apache.org/jira/browse/SPARK-9059 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Labels: starter Update Scala, Java and Python examples of Direct Kafka word count to access the offset ranges using HasOffsetRanges and print it. For example in Scala, {code} var offsetRanges: Array[OffsetRange] = _ ... directKafkaDStream.foreachRDD { rdd = offsetRanges = rdd.asInstanceOf[HasOffsetRanges] } ... transformedDStream.foreachRDD { rdd = // some operation println(Processed ranges: + offsetRanges) } {code} See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631142#comment-14631142 ] Benjamin Fradet commented on SPARK-9059: We could also demonstrate restarting from a specific set of offsets. Update Direct Kafka Word count examples to show the use of HasOffsetRanges -- Key: SPARK-9059 URL: https://issues.apache.org/jira/browse/SPARK-9059 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Labels: starter Update Scala, Java and Python examples of Direct Kafka word count to access the offset ranges using HasOffsetRanges and print it. For example in Scala, {code} var offsetRanges: Array[OffsetRange] = _ ... directKafkaDStream.foreachRDD { rdd = offsetRanges = rdd.asInstanceOf[HasOffsetRanges] } ... transformedDStream.foreachRDD { rdd = // some operation println(Processed ranges: + offsetRanges) } {code} See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9059) Update Direct Kafka Word count examples to show the use of HasOffsetRanges
[ https://issues.apache.org/jira/browse/SPARK-9059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629770#comment-14629770 ] Benjamin Fradet commented on SPARK-9059: I've started working on this. Update Direct Kafka Word count examples to show the use of HasOffsetRanges -- Key: SPARK-9059 URL: https://issues.apache.org/jira/browse/SPARK-9059 Project: Spark Issue Type: Sub-task Components: Streaming Reporter: Tathagata Das Labels: starter Update Scala, Java and Python examples of Direct Kafka word count to access the offset ranges using HasOffsetRanges and print it. For example in Scala, {code} var offsetRanges: Array[OffsetRange] = _ ... directKafkaDStream.foreachRDD { rdd = offsetRanges = rdd.asInstanceOf[HasOffsetRanges] } ... transformedDStream.foreachRDD { rdd = // some operation println(Processed ranges: + offsetRanges) } {code} See https://spark.apache.org/docs/latest/streaming-kafka-integration.html for more info, and the master source code for more updated information on python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8575) Deprecate callUDF in favor of udf
[ https://issues.apache.org/jira/browse/SPARK-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14598353#comment-14598353 ] Benjamin Fradet commented on SPARK-8575: I've started working on this issue. Deprecate callUDF in favor of udf - Key: SPARK-8575 URL: https://issues.apache.org/jira/browse/SPARK-8575 Project: Spark Issue Type: Improvement Components: SQL Reporter: Benjamin Fradet Priority: Minor Fix For: 1.5.0 Follow-up of [SPARK-8356|https://issues.apache.org/jira/browse/SPARK-8356] to use {{callUDF}} in favor of {{udf}} wherever possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8575) Deprecate callUDF in favor of udf
[ https://issues.apache.org/jira/browse/SPARK-8575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benjamin Fradet updated SPARK-8575: --- Description: Follow-up of [SPARK-8356|https://issues.apache.org/jira/browse/SPARK-8356] to use {{callUDF}} in favor of {{udf}} wherever possible. (was: Follow-up of [SPARK-8356|https://issues.apache.org/jira/browse/SPARK-8356] to deprecate callUDF in favor of udf wherever possible.) Deprecate callUDF in favor of udf - Key: SPARK-8575 URL: https://issues.apache.org/jira/browse/SPARK-8575 Project: Spark Issue Type: Improvement Components: SQL Reporter: Benjamin Fradet Priority: Minor Fix For: 1.5.0 Follow-up of [SPARK-8356|https://issues.apache.org/jira/browse/SPARK-8356] to use {{callUDF}} in favor of {{udf}} wherever possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8115) Remove TestData
[ https://issues.apache.org/jira/browse/SPARK-8115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14595052#comment-14595052 ] Benjamin Fradet commented on SPARK-8115: I've started working on this. Remove TestData --- Key: SPARK-8115 URL: https://issues.apache.org/jira/browse/SPARK-8115 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Priority: Minor TestData was from the era when we didn't have easy ways to generate test datasets. Now we have implicits on Seq + toDF, it'd make more sense to put the test datasets closer to the test suites. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8478) Harmonize UDF-related code to use uniformly UDF instead of Udf
[ https://issues.apache.org/jira/browse/SPARK-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593368#comment-14593368 ] Benjamin Fradet commented on SPARK-8478: As discussed on [SPARK-8356|https://issues.apache.org/jira/browse/SPARK-8356], it'd be cool to harmonize code regarding UDFs, I've started working on this. Harmonize UDF-related code to use uniformly UDF instead of Udf -- Key: SPARK-8478 URL: https://issues.apache.org/jira/browse/SPARK-8478 Project: Spark Issue Type: Improvement Components: SQL Reporter: Benjamin Fradet Priority: Minor Some UDF-related code uses Udf naming instead of UDF. This JIRA uniformizes the naming in favor of UDF. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593362#comment-14593362 ] Benjamin Fradet commented on SPARK-8356: I'll create a separate JIRA for harmonizing the naming in UDF-related code: [SPARK-8478|https://issues.apache.org/jira/browse/SPARK-8478]. Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8478) Harmonize UDF-related code to use uniformly UDF instead of Udf
Benjamin Fradet created SPARK-8478: -- Summary: Harmonize UDF-related code to use uniformly UDF instead of Udf Key: SPARK-8478 URL: https://issues.apache.org/jira/browse/SPARK-8478 Project: Spark Issue Type: Improvement Components: SQL Reporter: Benjamin Fradet Priority: Minor Some UDF-related code uses Udf naming instead of UDF. This JIRA uniformizes the naming in favor of UDF. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593362#comment-14593362 ] Benjamin Fradet edited comment on SPARK-8356 at 6/19/15 12:02 PM: -- I've created a separate JIRA for harmonizing the naming in UDF-related code: [SPARK-8478|https://issues.apache.org/jira/browse/SPARK-8478]. was (Author: benfradet): I'll create a separate JIRA for harmonizing the naming in UDF-related code: [SPARK-8478|https://issues.apache.org/jira/browse/SPARK-8478]. Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590491#comment-14590491 ] Benjamin Fradet commented on SPARK-8356: Somewhat related, about being coherent, there is {{PythonUDF}} and {{ScalaUdf}}. Maybe we should straighten this up as well. Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590513#comment-14590513 ] Benjamin Fradet commented on SPARK-8356: Ok, I'll make sure Udf disappear, should I open another JIRA or can I add it to the PR for this one? Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590478#comment-14590478 ] Benjamin Fradet commented on SPARK-8356: [~marmbrus] Are we sure {{callUDF}} is used for calling Java functions? Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14590521#comment-14590521 ] Benjamin Fradet commented on SPARK-8356: Ok, thanks a lot for your pointers. Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-8399) Overlap between histograms and axis' name in Spark Streaming UI
Benjamin Fradet created SPARK-8399: -- Summary: Overlap between histograms and axis' name in Spark Streaming UI Key: SPARK-8399 URL: https://issues.apache.org/jira/browse/SPARK-8399 Project: Spark Issue Type: Bug Components: Streaming, Web UI Affects Versions: 1.4.0 Reporter: Benjamin Fradet Priority: Minor If you have an histogram skewed towards the maximum of the displayed values as is the case with the number of messages processed per batchInterval with the Kafka direct API (since it's a constant) for example, the histogram will overlap with the name of the X axis (#batches). Unfortunately, I don't have any screenshots available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8399) Overlap between histograms and axis' name in Spark Streaming UI
[ https://issues.apache.org/jira/browse/SPARK-8399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588358#comment-14588358 ] Benjamin Fradet commented on SPARK-8399: I'll submit a patch shortly. Overlap between histograms and axis' name in Spark Streaming UI --- Key: SPARK-8399 URL: https://issues.apache.org/jira/browse/SPARK-8399 Project: Spark Issue Type: Bug Components: Streaming, Web UI Affects Versions: 1.4.0 Reporter: Benjamin Fradet Priority: Minor If you have an histogram skewed towards the maximum of the displayed values as is the case with the number of messages processed per batchInterval with the Kafka direct API (since it's a constant) for example, the histogram will overlap with the name of the X axis (#batches). Unfortunately, I don't have any screenshots available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8356) Reconcile callUDF and callUdf
[ https://issues.apache.org/jira/browse/SPARK-8356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14588579#comment-14588579 ] Benjamin Fradet commented on SPARK-8356: I've started working on this issue. Reconcile callUDF and callUdf - Key: SPARK-8356 URL: https://issues.apache.org/jira/browse/SPARK-8356 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Michael Armbrust Priority: Critical Labels: starter Right now we have two functions {{callUDF}} and {{callUdf}}. I think the former is used for calling Java functions (and the documentation is wrong) and the latter is for calling functions by name. Either way this is confusing and we should unify or pick different names. Also, lets make sure the docs are right. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-7255) spark.streaming.kafka.maxRetries not documented
Benjamin Fradet created SPARK-7255: -- Summary: spark.streaming.kafka.maxRetries not documented Key: SPARK-7255 URL: https://issues.apache.org/jira/browse/SPARK-7255 Project: Spark Issue Type: Documentation Components: Documentation, Streaming Affects Versions: 1.3.1 Reporter: Benjamin Fradet Priority: Minor Fix For: 1.4.0 I noticed there was no documentation for [spark.streaming.kafka.maxRetries|https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala#L66] was not documented in the [configuration pagehttp://spark.apache.org/docs/latest/configuration.html#spark-streaming]. Is this on purpose? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7255) spark.streaming.kafka.maxRetries not documented
[ https://issues.apache.org/jira/browse/SPARK-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520266#comment-14520266 ] Benjamin Fradet commented on SPARK-7255: Otherwise, I'd be glad to add it to the docs. spark.streaming.kafka.maxRetries not documented --- Key: SPARK-7255 URL: https://issues.apache.org/jira/browse/SPARK-7255 Project: Spark Issue Type: Documentation Components: Documentation, Streaming Affects Versions: 1.3.1 Reporter: Benjamin Fradet Priority: Minor Fix For: 1.4.0 I noticed there was no documentation for [spark.streaming.kafka.maxRetries|https://github.com/apache/spark/blob/master/external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala#L66] was not documented in the [configuration pagehttp://spark.apache.org/docs/latest/configuration.html#spark-streaming]. Is this on purpose? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org