Mailing lists matching spark.apache.org
commits spark.apache.orgdev spark.apache.org
issues spark.apache.org
reviews spark.apache.org
user spark.apache.org
Re: Is storage resources counted during the scheduling
Thanks Ted, but that page seems to be scheduling policy, I have no idea of what resources are considered in the scheduling. And for scheduling, I’m wondering, in case of just one application, is there also a scheduling process? otherwise, why I see some launching delay in the tasks. (well, this might be another question). Thanks. Best, Jialin > On Apr 11, 2016, at 3:18 PM, Ted Yu <yuzhih...@gmail.com> wrote: > > See > https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application > > <https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application> > > On Mon, Apr 11, 2016 at 3:15 PM, Jialin Liu <jaln...@lbl.gov > <mailto:jaln...@lbl.gov>> wrote: > Hi Spark users/experts, > > I’m wondering how does the Spark scheduler work? > What kind of resources will be considered during the scheduling, does it > include the disk resources or I/O resources, e.g., number of IO ports. > Is network resources considered in that? > > My understanding is that only CPU is considered, right? > > Best, > Jialin > ----- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> > >
Re: Where to set properties for the retainedJobs/Stages?
Yes but doc doesn't say any word for which variable the configs are valid, so do I have to set them for the history-server? The daemon? The workers? And what if I use the java API instead of spark-submit for the jobs? I guess that the spark-defaults.conf are obsolete for the java API? Am 2016-04-01 18:58, schrieb Ted Yu: You can set them in spark-defaults.conf See also https://spark.apache.org/docs/latest/configuration.html#spark-ui [1] On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io> wrote: Can somebody tell me the interaction between the properties: spark.ui.retainedJobs spark.ui.retainedStages spark.history.retainedApplications I know from the bugtracker, that the last one describes the number of applications the history-server holds in memory. Can I set the properties in the spark-env.sh? And where? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org Links: -- [1] https://spark.apache.org/docs/latest/configuration.html#spark-ui - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SPARK-13843 and future of streaming backends
Are you talking about group/identifier name, or contained classes? Because there are plenty of org.apache.* classes distributed via maven with non-apache group / identifiers. On Fri, Mar 25, 2016 at 6:54 PM, David Nalley <ke4...@apache.org> wrote: > >> As far as group / artifact name compatibility, at least in the case of >> Kafka we need different artifact names anyway, and people are going to >> have to make changes to their build files for spark 2.0 anyway. As >> far as keeping the actual classes in org.apache.spark to not break >> code despite the group name being different, I don't know whether that >> would be enforced by maven central, just looked at as poor taste, or >> ASF suing for trademark violation :) > > > Sonatype, has strict instructions to only permit org.apache.* to originate > from repository.apache.org. Exceptions to that must be approved by VP, > Infrastructure. > -- > Sent via Pony Mail for dev@spark.apache.org. > View this email online at: > https://pony-poc.apache.org/list.html?dev@spark.apache.org > > - > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only
[ https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495906#comment-15495906 ] Aseem Bansal commented on SPARK-17560: -- Looked through https://spark.apache.org/docs/2.0.0/sql-programming-guide.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/SparkSession.html https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/SparkConf.html and none of them say anything about this parameter > SQLContext tables returns table names in lower case only > > > Key: SPARK-17560 > URL: https://issues.apache.org/jira/browse/SPARK-17560 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Aseem Bansal > > I registered a table using > dataSet.createOrReplaceTempView("TestTable"); > Then I tried to get the list of tables using > sparkSession.sqlContext().tableNames() > but the name that I got was testtable. It used to give table names in proper > case in Spark 1.4 -- This message was sent by Atlassian JIRA (v6.3.4#6332) ----- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Spark streaming completed batches statistics
Ok it looks like I could reconstruct the logic in the Spark UI from the /jobs resource. Thanks. https://richardstartin.com/ From: map reduced <k3t.gi...@gmail.com> Sent: 07 December 2016 19:49 To: Richard Startin Cc: user@spark.apache.org Subject: Re: Spark streaming completed batches statistics Have you checked http://spark.apache.org/docs/latest/monitoring.html#rest-api ? KP On Wed, Dec 7, 2016 at 11:43 AM, Richard Startin <richardstar...@outlook.com<mailto:richardstar...@outlook.com>> wrote: Is there any way to get this information as CSV/JSON? https://docs.databricks.com/_images/CompletedBatches.png [https://docs.databricks.com/_images/CompletedBatches.png] https://richardstartin.com/ From: Richard Startin <richardstar...@outlook.com<mailto:richardstar...@outlook.com>> Sent: 05 December 2016 15:55 To: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Spark streaming completed batches statistics Is there any way to get a more computer friendly version of the completes batches section of the streaming page of the application master? I am very interested in the statistics and am currently screen-scraping... https://richardstartin.com - To unsubscribe e-mail: user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
Re: EXT: Multiple cores/executors in Pyspark standalone mode
In Local Mode all processes are executed inside a single JVM. Application is started in a local mode by setting master to local, local[*] or local[n]. spark.executor.cores and spark.executor.cores are not applicable in the local mode because there is only one embedded executor. In Standalone mode, you need standalone Spark cluster<https://spark.apache.org/docs/latest/spark-standalone.html>. It requires a master node (can be started using SPARK_HOME/sbin/start-master.sh script) and at least one worker node (can be started using SPARK_HOME/sbin/start-slave.sh script).SparkConf should use master node address to create (spark://host:port) Thanks! Gangadhar From: Li Jin <ice.xell...@gmail.com<mailto:ice.xell...@gmail.com>> Date: Friday, March 24, 2017 at 3:43 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: EXT: Multiple cores/executors in Pyspark standalone mode Hi, I am wondering does pyspark standalone (local) mode support multi cores/executors? Thanks, Li - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers
Congratulations, Hyukjin and Sameer! > 在 2017年8月8日,00:01,蒋星博 <jiangxb1...@gmail.com> 写道: > > Congratulation, Hyukjin and Sameer! > > 2017-08-07 23:57 GMT+08:00 <linguin@gmail.com > <mailto:linguin@gmail.com>>: > Congrats! > > 2017/08/08 0:55、Bai, Dave <dave.1@here.com <mailto:dave.1@here.com>> > のメッセージ: > > > Congrats, leveled up!=) > > > >> On 8/7/17, 10:53 AM, "Matei Zaharia" <matei.zaha...@gmail.com > >> <mailto:matei.zaha...@gmail.com>> wrote: > >> > >> Hi everyone, > >> > >> The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as > >> committers. Join me in congratulating both of them and thanking them for > >> their contributions to the project! > >> > >> Matei > >> - > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> <mailto:dev-unsubscr...@spark.apache.org> > >> > > > > > > ----- > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > <mailto:dev-unsubscr...@spark.apache.org> > > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > >
[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19290#discussion_r139919707 --- Diff: R/pkg/R/mllib_tree.R --- @@ -352,10 +353,10 @@ setMethod("write.ml", signature(object = "GBTClassificationModel", path = "chara #' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to #' save/load fitted models. #' For more details, see -#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-regression}{ -#' Random Forest Regression} and -#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier}{ -#' Random Forest Classification} +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest- +#' regression}{Random Forest Regression} and +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest- +#' classifier}{Random Forest Classification} --- End diff -- links were checked --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19290#discussion_r139919715 --- Diff: R/pkg/R/mllib_tree.R --- @@ -132,10 +132,10 @@ print.summary.decisionTree <- function(x) { #' Gradient Boosted Tree model, \code{predict} to make predictions on new data, and #' \code{write.ml}/\code{read.ml} to save/load fitted models. #' For more details, see -#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression}{ -#' GBT Regression} and -#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier}{ -#' GBT Classification} +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted- +#' tree-regression}{GBT Regression} and +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted- +#' tree-classifier}{GBT Classification} --- End diff -- links were checked --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19290#discussion_r139919722 --- Diff: R/pkg/R/mllib_tree.R --- @@ -567,10 +569,10 @@ setMethod("write.ml", signature(object = "RandomForestClassificationModel", path #' model, \code{predict} to make predictions on new data, and \code{write.ml}/\code{read.ml} to #' save/load fitted models. #' For more details, see -#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-regression}{ -#' Decision Tree Regression} and -#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier}{ -#' Decision Tree Classification} +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree- +#' regression}{Decision Tree Regression} and +#' \href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree- +#' classifier}{Decision Tree Classification} --- End diff -- links were checked --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19485 Thanks for taking a look for this one. Actually, I thought we should add a chapter like http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets And, add a link to, for example, https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv for Python, http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame for Scala and http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq- for Java to refer the options, rather than duplicating the option list (which we should duplicately update when we fix or add options). Probably, we should add some links to JSON ones too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
RE: Adding Custom finalize method to RDDs.
I want to delete some files which I created In my datasource api, as soon as the RDD is cleaned up. Thanks, Nasrulla From: Vinoo Ganesh Sent: Monday, June 10, 2019 1:32 PM To: Nasrulla Khan Haris ; dev@spark.apache.org Subject: Re: Adding Custom finalize method to RDDs. Generally overriding the finalize() method is an antipattern (it was in fact deprecated in java 11 https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html#finalize()) . What’s the use case here? From: Nasrulla Khan Haris mailto:nasrulla.k...@microsoft.com.INVALID>> Date: Monday, June 10, 2019 at 15:44 To: "dev@spark.apache.org<mailto:dev@spark.apache.org>" mailto:dev@spark.apache.org>> Subject: RE: Adding Custom finalize method to RDDs. Hello Everyone, Is there a way to do it from user-code ? Thanks, Nasrulla From: Nasrulla Khan Haris mailto:nasrulla.k...@microsoft.com.INVALID>> Sent: Sunday, June 9, 2019 5:30 PM To: dev@spark.apache.org<mailto:dev@spark.apache.org> Subject: Adding Custom finalize method to RDDs. Hi All, Is there a way to add custom finalize method to RDD objects to add custom logic when RDDs are destructed by JVM ? Thanks, Nasrulla
[GitHub] [spark] wangyum commented on issue #25542: [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive
wangyum commented on issue #25542: [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive URL: https://github.com/apache/spark/pull/25542#issuecomment-523528243 Our example always uses `--jars one.jar,two.jar`. It seems `--jars=one.jar,two.jar` is not a standard usage. http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications http://spark.apache.org/docs/latest/running-on-yarn.html#adding-other-jars http://spark.apache.org/docs/latest/rdd-programming-guide.html#using-the-shell http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aof00 opened a new pull request #30376: change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' #SPARK-33451
aof00 opened a new pull request #30376: URL: https://github.com/apache/spark/pull/30376 JIRA Issue: https://issues.apache.org/jira/browse/SPARK-33451 In the 'Optimizing Skew Join' section of the following two pages: 1. [https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html](https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html) 2. [https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html](https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html) The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' should be changed to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', The former is missing the 'skewJoin'. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang opened a new pull request #31525: [3.1][INFRA][DOC] Change the facetFilters of Docsearch to 3.1.1
gengliangwang opened a new pull request #31525: URL: https://github.com/apache/spark/pull/31525 ### What changes were proposed in this pull request? As https://github.com/algolia/docsearch-configs/pull/3391 is merged, This PR changes the facetFilters of Docsearch as 3.1.1. ### Why are the changes needed? So that the search result of the published Spark site will points to https://spark.apache.org/docs/3.1.1 instead of https://spark.apache.org/docs/latest/. This is useful for searching the docs of 3.1.1 after there are new Spark releases. ### Does this PR introduce _any_ user-facing change? Yes, the search result of 3.1.1 Spark doc site is based on https://spark.apache.org/docs/3.1.1 instead of https://spark.apache.org/docs/latest/ ### How was this patch tested? Just configuration changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid
[ https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441494#comment-17441494 ] dch nguyen edited comment on SPARK-37260 at 11/10/21, 4:03 AM: --- ping [~hyukjin.kwon] , is this issue resolved by [#34475|https://github.com/apache/spark/pull/34475]? was (Author: dchvn): [~hyukjin.kwon] , is this issue resolved by [#34475|https://github.com/apache/spark/pull/34475]? > PYSPARK Arrow 3.2.0 docs link invalid > - > > Key: SPARK-37260 > URL: https://issues.apache.org/jira/browse/SPARK-37260 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Thomas Graves > Priority: Major > > [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html] > links to: > [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html] > which links to: > [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst] > But that is an invalid link. > I assume its supposed to point to: > https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid
[ https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441494#comment-17441494 ] dch nguyen commented on SPARK-37260: [~hyukjin.kwon] , is this issue resolved by [#34475|https://github.com/apache/spark/pull/34475]? > PYSPARK Arrow 3.2.0 docs link invalid > - > > Key: SPARK-37260 > URL: https://issues.apache.org/jira/browse/SPARK-37260 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Thomas Graves > Priority: Major > > [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html] > links to: > [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html] > which links to: > [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst] > But that is an invalid link. > I assume its supposed to point to: > https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid
[ https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17442044#comment-17442044 ] Hyukjin Kwon commented on SPARK-37260: -- oh yeah. that's fixed via #34475. There are some more ongoing issues on the docs. I will fix them up and probably we could initiate spark 3.2.1. > PYSPARK Arrow 3.2.0 docs link invalid > - > > Key: SPARK-37260 > URL: https://issues.apache.org/jira/browse/SPARK-37260 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Thomas Graves > Priority: Major > > [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html] > links to: > [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html] > which links to: > [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst] > But that is an invalid link. > I assume its supposed to point to: > https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid
[ https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37260. -- Resolution: Fixed > PYSPARK Arrow 3.2.0 docs link invalid > - > > Key: SPARK-37260 > URL: https://issues.apache.org/jira/browse/SPARK-37260 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Thomas Graves > Priority: Major > > [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html] > links to: > [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html] > which links to: > [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst] > But that is an invalid link. > I assume its supposed to point to: > https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid
[ https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37260: - Fix Version/s: 3.2.1 > PYSPARK Arrow 3.2.0 docs link invalid > - > > Key: SPARK-37260 > URL: https://issues.apache.org/jira/browse/SPARK-37260 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Thomas Graves >Priority: Major > Fix For: 3.2.1 > > > [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html] > links to: > [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html] > which links to: > [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst] > But that is an invalid link. > I assume its supposed to point to: > https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html -- This message was sent by Atlassian Jira (v8.20.1#820001) ----- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid
Thomas Graves created SPARK-37260: - Summary: PYSPARK Arrow 3.2.0 docs link invalid Key: SPARK-37260 URL: https://issues.apache.org/jira/browse/SPARK-37260 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 3.2.0 Reporter: Thomas Graves [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html] links to: [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html] which links to: [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst] But that is an invalid link. I assume its supposed to point to: https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #43011: [WIP][SPARK-45232][DOCS] Add missing function groups to SQL references
zhengruifeng commented on PR #43011: URL: https://github.com/apache/spark/pull/43011#issuecomment-1728581850 @allisonwang-db I am not sure, I don't see document for FROM clause, you may check 3 places: - https://spark.apache.org/docs/latest/api/sql/index.html#explode - https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html#generator-functions - https://spark.apache.org/docs/latest/sql-ref-syntax.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zr-msft commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md
zr-msft commented on PR #35561: URL: https://github.com/apache/spark/pull/35561#issuecomment-1148923071 @dongjoon-hyun I've periodically checked the docs site and I'm not seeing any changes show up based on commits i've added from this PR: * https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration * https://spark.apache.org/docs/3.2.1/running-on-kubernetes.html#configuration I'm also not seeing earlier commits show up: * https://github.com/apache/spark/commit/302cb2257b66642cd3de0f61a700293b8ac7b000 * https://github.com/apache/spark/commit/476214bc1cc813f0a2332bee53dfc7248ebd2a66 The most recent commit that shows up on this page is from Jul 18, 2021: * https://github.com/apache/spark/commit/eea69c122f20577956c4a87a6d8eb59943c1c6f0 -- https://spark.apache.org/docs/latest/running-on-kubernetes.html#prerequisites -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String
HyukjinKwon commented on PR #38470: URL: https://github.com/apache/spark/pull/38470#issuecomment-1299418917 Maybe it's better to have a JIRA. BTW, wonder if we have an e2e example for users can copy and paste to try. (e.g., like most of docs in https://spark.apache.org/docs/latest/index.html). Another decision to make is if we should document it in PySpark docs (https://spark.apache.org/docs/latest/api/python/getting_started/index.html) or Spark main page (https://spark.apache.org/docs/latest/index.html) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[jira] [Updated] (CALCITE-6241) Add a few existing functions to Spark library
[ https://issues.apache.org/jira/browse/CALCITE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] EveyWu updated CALCITE-6241: Description: Add Spark as a supported library for functions that have already been implemented for other libraries. Spark Functions Link:[https://spark.apache.org/docs/latest/api/sql/index.html|https://spark.apache.org/docs/latest/api/sql/index.html#rtrim] Add function List: * DECODE was: Add Spark as a supported library for functions that have already been implemented for other libraries. Spark Functions Link:[https://spark.apache.org/docs/latest/api/sql/index.html|https://spark.apache.org/docs/latest/api/sql/index.html#rtrim] > Add a few existing functions to Spark library > - > > Key: CALCITE-6241 > URL: https://issues.apache.org/jira/browse/CALCITE-6241 > Project: Calcite > Issue Type: Improvement >Reporter: EveyWu >Priority: Minor > > Add Spark as a supported library for functions that have already been > implemented for other libraries. > Spark Functions > Link:[https://spark.apache.org/docs/latest/api/sql/index.html|https://spark.apache.org/docs/latest/api/sql/index.html#rtrim] > Add function List: > * DECODE > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: acquire and give back resources dynamically
http://spark.apache.org/docs/latest/running-on-yarn.html Spark just a Yarn application 在 2014年8月14日,11:12,牛兆捷 nzjem...@gmail.com 写道: Dear all: Does spark can acquire resources from and give back resources to YARN dynamically ? -- *Regards,* *Zhaojie* - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: spark won't build with maven
You are running a Continuous Compilation. AFAIK, it runs in an infinite loop and will compile only the modified files. For compiling with maven, have a look at these steps - https://spark.apache.org/docs/latest/building-with-maven.html Thanks, Visakh -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-won-t-build-with-maven-tp12173p12176.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: LDA example?
You can check out this pull request: https://github.com/apache/spark/pull/476 LDA is on the roadmap for the 1.2 release, hopefully we will officially support it then! Best, Burak - Original Message - From: Denny Lee denny.g@gmail.com To: user@spark.apache.org Sent: Thursday, August 21, 2014 10:10:35 PM Subject: LDA example? Quick question - is there a handy sample / example of how to use the LDA algorithm within Spark MLLib? Thanks! Denny - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: resize memory size for caching RDD
AFAIK, No. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 11:30 AM To: user@spark.apache.org Subject: resize memory size for caching RDD Dear all: Spark uses memory to cache RDD and the memory size is specified by spark.storage.memoryFraction. One the Executor starts, does Spark support adjusting/resizing memory size of this part dynamically? Thanks. -- Regards, Zhaojie - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark SQL - Exception only when using cacheTable
I am using the python api. Unfortunately, I cannot find the isCached method equivalent in the documentation: https://spark.apache.org/docs/1.1.0/api/python/index.html in the SQLContext section. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16137.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
How to calculate percentiles with Spark?
Hi, What would be the best way to get percentiles from a Spark RDD? I can see JavaDoubleRDD or MLlib's MultivariateStatisticalSummary https://spark.apache.org/docs/latest/mllib-statistics.html provide the mean() but not percentiles. Thank you! Horace -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-calculate-percentiles-with-Spark-tp16937.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Rdd of Rdds
On Wednesday, October 22, 2014 9:06 AM, Sean Owen so...@cloudera.com wrote: No, there's no such thing as an RDD of RDDs in Spark. Here though, why not just operate on an RDD of Lists? or a List of RDDs? Usually one of these two is the right approach whenever you feel inclined to operate on an RDD of RDDs. Depending on one's needs, one could also consider the matrix (RDD[Vector]) operations provided by MLLib, such as https://spark.apache.org/docs/latest/mllib-statistics.html - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Use RDD like a Iterator
Call RDD.toLocalIterator()? https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html On Wed, Oct 29, 2014 at 4:15 AM, Dai, Kevin yun...@ebay.com wrote: Hi, ALL I have a RDD[T], can I use it like a iterator. That means I can compute every element of this RDD lazily. Best Regards, Kevin. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
StreamingLinearRegressionWithSGD
Hi Gurus, I did not look at the code yet. I wonder if StreamingLinearRegressionWithSGD http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.html is equivalent to LinearRegressionWithSGD http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.htmlwith starting weights of the current batch as the ending weights of the last batch? Since RidgeRegressionModel http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/RidgeRegressionModel.html does not seem to have a streaming version, just wonder if this way will suffice. Thanks! J
[jira] [Created] (SPARK-5409) Broken link in documentation
Mauro Pirrone created SPARK-5409: Summary: Broken link in documentation Key: SPARK-5409 URL: https://issues.apache.org/jira/browse/SPARK-5409 Project: Spark Issue Type: Documentation Reporter: Mauro Pirrone Priority: Minor https://spark.apache.org/docs/1.2.0/streaming-kafka-integration.html See the API docs and the example. Link to example is broken. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Upgrade to Spark 1.2.1 using Guava
Maybe but any time the work around is to use spark-submit --conf spark.executor.extraClassPath=/guava.jar blah” that means that standalone apps must have hard coded paths that are honored on every worker. And as you know a lib is pretty much blocked from use of this version of Spark—hence the blocker severity. I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. There is no class conflict. On Feb 27, 2015, at 7:13 PM, Sean Owen so...@cloudera.com wrote: This seems like a job for userClassPathFirst. Or could be. It's definitely an issue of visibility between where the serializer is and where the user class is. At the top you said Pat that you didn't try this, but why not? On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote: I’ll try to find a Jira for it. I hope a fix is in 1.3 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote: Thanks! that worked. On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote: I don’t use spark-submit I have a standalone app. So I guess you want me to add that key/value to the conf in my code and make sure it exists on workers. On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote: On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote: I changed in the spark master conf, which is also the only worker. I added a path to the jar that has guava in it. Still can’t find the class. Sorry, I'm still confused about what config you're changing. I'm suggesting using: spark-submit --conf spark.executor.extraClassPath=/guava.jar blah -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkStream saveAsTextFiles()
Structure seems fine. Only need to type at the end of your program: ssc.start(); ssc.awaitTermination(); Check method arguments. I advise you to check the spark java api streaming. https://spark.apache.org/docs/1.3.0/api/java/ Regards. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkStream-saveAsTextFiles-tp22719p22755.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark sql, creating literal columns in java.
This should work from java too: http://spark.apache.org/docs/1.3.1/api/java/index.html#org.apache.spark.sql.functions$ On Tue, May 5, 2015 at 4:15 AM, Jan-Paul Bultmann janpaulbultm...@me.com wrote: Hey, What is the recommended way to create literal columns in java? Scala has the `lit` function from `org.apache.spark.sql.functions`. Should it be called from java as well? Cheers jan - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Performance tuning in Spark SQL.
Please see below link for the ways available https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#performance-tuning For example, reduce spark.sql.shuffle.partitions from 200 to 10 could improve the performance significantly -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-tuning-in-Spark-SQL-tp21871p23576.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: How to set environment of worker applications
Check for spark.driver.extraJavaOptions and spark.executor.extraJavaOptions in the following article. I think you can use -D to pass system vars: spark.apache.org/docs/latest/configuration.html#runtime-environment Hi, I am starting a spark streaming job in standalone mode with spark-submit. Is there a way to make the UNIX environment variables with which spark-submit is started available to the processes started on the worker nodes? Jan - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Create RDD from output of unix command
You may want to look into using the pipe command .. http://blog.madhukaraphatak.com/pipe-in-spark/ http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23895.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Python Kafka support?
Hi Darren, Functionality like messageHandler is missing in python API, still not included in version 1.5.1. Thanks Jerry On Wed, Nov 11, 2015 at 7:37 AM, Darren Govoni <dar...@ontrenet.com> wrote: > Hi, > I read on this page > http://spark.apache.org/docs/latest/streaming-kafka-integration.html > about python support for "receiverless" kafka integration (Approach 2) but > it says its incomplete as of version 1.4. > > Has this been updated in version 1.5.1? > > Darren > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Ranger-like Security on Spark
Even simple Spark-on-YARN should run as the user that submitted the job, yes, so HDFS ACLs should be enforced. Not sure how it plays with the rest of Ranger. Matei > On Sep 3, 2015, at 4:57 PM, Jörn Franke <jornfra...@gmail.com> wrote: > > Well if it needs to read from hdfs then it will adhere to the permissions > defined there And/or in ranger. However, I am not aware that you can protect > dataframes, tables or streams in general in Spark. > > Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz <danielschulz2...@hotmail.com > <mailto:danielschulz2...@hotmail.com>> a écrit : > Hi Matei, > > Thanks for your answer. > > My question is regarding simple authenticated Spark-on-YARN only, without > Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my > HDFS user and only be able to access files I am entitled to read/write? Will > it enforce HDFS ACLs and Ranger policies as well? > > Best regards, Daniel. > > > On 03 Sep 2015, at 21:16, Matei Zaharia <matei.zaha...@gmail.com > > <mailto:matei.zaha...@gmail.com>> wrote: > > > > If you run on YARN, you can use Kerberos, be authenticated as the right > > user, etc in the same way as MapReduce jobs. > > > > Matei > > > >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz <danielschulz2...@hotmail.com > >> <mailto:danielschulz2...@hotmail.com>> wrote: > >> > >> Hi, > >> > >> I really enjoy using Spark. An obstacle to sell it to our clients > >> currently is the missing Kerberos-like security on a Hadoop with simple > >> authentication. Are there plans, a proposal, or a project to deliver a > >> Ranger plugin or something similar to Spark. The target is to > >> differentiate users and their privileges when reading and writing data to > >> HDFS? Is Kerberos my only option then? > >> > >> Kind regards, Daniel. > >> ----- > >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> <mailto:user-unsubscr...@spark.apache.org> > >> For additional commands, e-mail: user-h...@spark.apache.org > >> <mailto:user-h...@spark.apache.org> > > > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > <mailto:user-unsubscr...@spark.apache.org> > > For additional commands, e-mail: user-h...@spark.apache.org > > <mailto:user-h...@spark.apache.org> > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > <mailto:user-unsubscr...@spark.apache.org> > For additional commands, e-mail: user-h...@spark.apache.org > <mailto:user-h...@spark.apache.org> >
[jira] [Created] (SPARK-12661) Drop Python 2.6 support in PySpark
Davies Liu created SPARK-12661: -- Summary: Drop Python 2.6 support in PySpark Key: SPARK-12661 URL: https://issues.apache.org/jira/browse/SPARK-12661 Project: Spark Issue Type: Task Reporter: Davies Liu 1. stop testing with 2.6 2. remove the code for python 2.6 see discussion : https://www.mail-archive.com/user@spark.apache.org/msg43423.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15966) Fix markdown for Spark Monitoring
Dhruve Ashar created SPARK-15966: Summary: Fix markdown for Spark Monitoring Key: SPARK-15966 URL: https://issues.apache.org/jira/browse/SPARK-15966 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.0.0 Reporter: Dhruve Ashar Priority: Trivial The markdown for Spark monitoring needs to be fixed. http://spark.apache.org/docs/2.0.0-preview/monitoring.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
Re: Does filter on an RDD scan every data item ?
Looks like this has been supported from 1.4 release :) https://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p26049.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: udf StructField to JSON String
Have you looked at DataFrame.write.json( path )? https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter > On Mar 11, 2016, at 7:15 AM, Caires Vinicius <caire...@gmail.com> wrote: > > I have one DataFrame with nested StructField and I want to convert to JSON > String. There is anyway to accomplish this? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark ML Interaction
Could you create a JIRA to add an example and documentation? Thanks On Tue, 8 Mar 2016 at 16:18, amarouni <amaro...@talend.com> wrote: > Hi, > > Did anyone here manage to write an example of the following ML feature > transformer > > http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/Interaction.html > ? > It's not documented on the official Spark ML features pages but it can > be found in the package API javadocs. > > Thanks, > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: please add Christchurch Apache Spark Meetup Group
(I have the site's svn repo handy, so I just added it.) On Wed, Mar 2, 2016 at 5:16 PM, Raazesh Sainudiin <raazesh.sainud...@gmail.com> wrote: > Hi, > > Please add Christchurch Apache Spark Meetup Group to the community list > here: > http://spark.apache.org/community.html > > Our Meetup URI is: > http://www.meetup.com/Christchurch-Apache-Spark-Meetup/ > > Thanks, > Raaz - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Where to set properties for the retainedJobs/Stages?
You can set them in spark-defaults.conf See also https://spark.apache.org/docs/latest/configuration.html#spark-ui On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io> wrote: > Can somebody tell me the interaction between the properties: > > spark.ui.retainedJobs > spark.ui.retainedStages > spark.history.retainedApplications > > I know from the bugtracker, that the last one describes the number of > applications the history-server holds in memory. > > Can I set the properties in the spark-env.sh? And where? > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: yarn-cluster
Hi, this is a good spot to start for Spark and YARN. https://spark.apache.org/docs/1.5.0/running-on-yarn.html specific to the version you are on, you can toggle between pages. - Neelesh S. Salian Cloudera -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/yarn-cluster-tp26846p26882.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
[jira] [Resolved] (SPARK-15228) pyspark.RDD.toLocalIterator Documentation
[ https://issues.apache.org/jira/browse/SPARK-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15228. --- Resolution: Not A Problem > pyspark.RDD.toLocalIterator Documentation > - > > Key: SPARK-15228 > URL: https://issues.apache.org/jira/browse/SPARK-15228 > Project: Spark > Issue Type: Documentation >Reporter: Ignacio Tartavull >Priority: Trivial > > There is a little bug in the parsing of the documentation of > http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.toLocalIterator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-15228) pyspark.RDD.toLocalIterator Documentation
Ignacio Tartavull created SPARK-15228: - Summary: pyspark.RDD.toLocalIterator Documentation Key: SPARK-15228 URL: https://issues.apache.org/jira/browse/SPARK-15228 Project: Spark Issue Type: Documentation Reporter: Ignacio Tartavull Priority: Trivial There is a little bug in the parsing of the documentation of http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.toLocalIterator -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[GitHub] spark issue #16816: Code style improvement
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16816 @zhoucen please close this PR and read http://spark.apache.org/contributing.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16863: Swamidass & Baldi Approximations
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16863 Please review http://spark.apache.org/contributing.html before opening a pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16676: delete useless var “j”
Github user srowen commented on the issue: https://github.com/apache/spark/pull/16676 Merged to master. Please read http://spark.apache.org/contributing.html for next time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #8632: Update README.md
Github user packtpartner commented on the issue: https://github.com/apache/spark/pull/8632 Hi @srowen , where is the Github repository to feature books on http://spark.apache.org/documentation.html ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16638: spark-19115
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16638 Could you follow the title requirement in http://spark.apache.org/contributing.html? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: CSV escaping not working
I’d think quoting is only necessary if you are not escaping delimiters in data. But we can only share our opinions. It would be good to see something documented. This may be the cause of the issue?: https://issues.apache.org/jira/browse/CSV-135 From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>> Date: Thursday, October 27, 2016 at 12:49 PM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: CSV escaping not working well my expectation would be that if you have delimiters in your data you need to quote your values. if you now have quotes without your data you need to escape them. so escaping is only necessary if quoted. On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit <nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote: Do you mind sharing why should escaping not work without quotes? From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>> Date: Thursday, October 27, 2016 at 12:40 PM To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: CSV escaping not working that is what i would expect: escaping only works if quoted On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote: Interesting finding: Escaping works if data is quoted but not otherwise. From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>> Date: Thursday, October 27, 2016 at 10:54 AM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: CSV escaping not working I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read a csv file which has \ escapes. val myDA = spark.read .option("quote",null) .schema(mySchema) .csv(filePath) As per documentation \ is default escape for csv reader. But it does not work. Spark is reading \ as part of my data. For Ex: City column in csv file is north rocks\,au . I am expecting city column should read in code as northrocks,au. But instead spark reads it as northrocks\ and moves au to next column. I have tried following but did not work: * Explicitly defined escape .option("escape",”\\") * Changed escape to | or : in file and in code * I have tried using spark-csv library Any one facing same issue? Am I missing something? Thanks
RE: Spark SQL join and subquery
unsubscribe -Original Message- From: neil90 [mailto:neilp1...@icloud.com] Sent: Thursday, November 17, 2016 8:26 AM To: user@spark.apache.org Subject: Re: Spark SQL join and subquery What version of Spark are you using? I believe this was fixed in 2.0 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-join-and-subquery-tp28093p28097.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[GitHub] spark issue #17309: same rdd rule testcase
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17309 See http://spark.apache.org/contributing.html I'm not clear this adds any value? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/17556 http://spark.apache.org/docs/latest/building-spark.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
RE: RDD functions using GUI
Ping... wonder why there aren't any such drag-n-drop GUI tool for creating batch query scripts? Thanks From: Ke Yang (Conan) Sent: Monday, April 17, 2017 5:31 PM To: 'dev@spark.apache.org' <dev@spark.apache.org> Subject: RDD functions using GUI Hi, Are there drag and drop GUI (code-free) for RDD functions available? i.e. a GUI that generates code based on drag-n-drops? http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds thanks for brainstorming
[GitHub] spark issue #18836: Update SortMergeJoinExec.scala
Github user srowen commented on the issue: https://github.com/apache/spark/pull/18836 You didn't read the link above, I take it? http://spark.apache.org/contributing.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18414: Update status of application to RUNNING if executors are...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18414 please fix up the PR title: http://spark.apache.org/contributing.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
user-unsubscr...@spark.apache.org
user-unsubscr...@spark.apache.org From: 萝卜丝炒饭 [mailto:1427357...@qq.com] Sent: Sunday, May 21, 2017 8:15 PM To: user <user@spark.apache.org> Subject: Are tachyon and akka removed from 2.1.1 please HI all, Iread some paper about source code, the paper base on version 1.2. they refer the tachyon and akka. When i read the 2.1code. I can not find the code abiut akka and tachyon. Are tachyon and akka removed from 2.1.1 please
[GitHub] spark issue #19238: [SPARK-22016][SQL] Add HiveDialect for JDBC connection t...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19238 I can see the value, but it does not perform well in most cases if we using JDBC connection. Instead of adding the extra dialect to upstream, could you please add Hive as a separate data source? Thanks! https://spark.apache.org/third-party-projects.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19433: [SPARK-3162] [MLlib][WIP] Add local tree training for de...
Github user smurching commented on the issue: https://github.com/apache/spark/pull/19433 Thanks! I'll remove the WIP. To clear things up for the future, I'd thought [WIP] was the appropriate tag for a PR that's ready for review but not ready to be merged (based on https://spark.apache.org/contributing.html) -- have we stopped using the WIP tag? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19429 When I opened a JIRA, I thought a chapter such as https://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets. This chapter, `Manually Specifying Options`, looks describing how to specify options BTW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19154: Fix DiskBlockManager crashing when a root local folder h...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19154 I don't think it's reasonable to handle the case where people arbitrarily delete data from under Spark. This can may be easy to fix; others won't. This also isn't how changes are proposed: http://spark.apache.org/contributing.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19343: [SPARK-22121][SQL] Correct database location for namenod...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19343 @squito Thank you! Instead of changing the source codes, could we just update the document https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#hive-tables ? This might be enough for this issue. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: Welcoming Tejas Patil as a Spark committer
Congratulation Tejas! Kazuaki Ishizaki From: Matei Zaharia <matei.zaha...@gmail.com> To: "dev@spark.apache.org" <dev@spark.apache.org> Date: 2017/09/30 04:58 Subject:Welcoming Tejas Patil as a Spark committer Hi all, The Spark PMC recently added Tejas Patil as a committer on the project. Tejas has been contributing across several areas of Spark for a while, focusing especially on scalability issues and SQL. Please join me in welcoming Tejas! Matei - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19485 @jomach and @HyukjinKwon I did not generate the doc. I think we should follow what we did for JDBC. http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases List all the public options for each built-in data sources. Thus, it makes sense to add a new chapter for CSV --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19714: [SPARK-22489][SQL] Shouldn't change broadcast join build...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19714 LGTM Thanks! Merged to master. Could you submit a follow-up PR to document the behavior changes in migration section of Spark SQL? https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc
/latest does not point to 2.2.1 yet. Not all the pieces are released yet, as I understand? On Sun, Dec 17, 2017 at 8:12 AM Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > I saw the following commit, but I can't seem to see 2.2.1 as the version > in the header of the documentation pages under > http://spark.apache.org/docs/latest/ (that is still 2.2.0). Is this being > worked on? > > http://spark.apache.org/docs/2.2.1 is available and shows the proper > version, but not http://spark.apache.org/docs/latest :( > > Pozdrawiam, > Jacek Laskowski > > >
[GitHub] spark issue #19996: [MINOR][DOC] Fix the link of 'Getting Started'
Github user mcavdar commented on the issue: https://github.com/apache/spark/pull/19996 @srowen [Here](https://github.com/mcavdar/NLP/blob/master/Broken%20Links/spark/spark_404links.txt) is all broken links, it may be useful. Each line contains broken link and parent page (separated by tab). About 75-100 broken links are related to "http(s)://spark.apache.org/docs/latest". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20219: [SPARK-23025][SQL] Support Null type in scala reflection
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20219 `NullType` is not well supported in almost all the data sources. We did not mention it in our doc https://spark.apache.org/docs/latest/sql-programming-guide.html cc @cloud-fan @marmbrus @rxin @sameeragarwal Any comment about this support? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19290: [SPARK-22063][R] Fixes lint check failures in R by lates...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19290 BTW, I believe we are testing it with R 3.4.1 via AppVeyor too. I have been thinking it's good to test both old and new versions ... I think we have a weak promise for `R 3.1+` - http://spark.apache.org/docs/latest/index.html#downloading --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20334: How to check registered table name.
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20334 Hey @AtulKumVerma, questions should go to mailing list usually. See http://spark.apache.org/community.html. I believe you can have a better answer from there. Pull request from a branch to another branch actually causes a slight visual problem. Mind closing this pull request please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20254 @henryr Since Spark 2.3, Spark SQL documents all the behavior changes in [Migration Guides](https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide). Hopefully, this can help our end users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21961: Spark 20597
Github user mahmoudmahdi24 commented on the issue: https://github.com/apache/spark/pull/21961 Hello @Satyajitv, please rename the title of this PR properly. The PR title should be of the form [SPARK-][COMPONENT] Title, where SPARK- is the relevant JIRA number, COMPONENT is one of the PR categories shown at spark-prs.appspot.com and Title may be the JIRAâs title or a more specific title describing the PR itself. Take a look to this helpful document : https://spark.apache.org/contributing.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22177: stages in wrong order within job page DAG chart
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/22177 Please change title to "[SPARK-25199][Web UI] XXX " as we described in http://spark.apache.org/contributing.html. ``` check the DAG chart in job page. ``` Could you also put the DAG chart screenshot after your fix? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21589 AFAIK, we always have num of executor and then num of core per executor right? https://spark.apache.org/docs/latest/configuration.html#execution-behavior maybe we should have the getter factored the same way and probably named similarly --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22339 Hi, @ScrapCodes . Could you do the followings? - Update the title to `[SPARK-17159][SS]...` - Remove `Please review http://spark.apache.org/contributing.html ` from PR description - Share the numbers because the PR title has `Significant speed up` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22367 Usually we merge into master and backport to other branches when it's needed. https://spark.apache.org/contributing.html > 5. Open a pull request against the master branch of apache/spark. (Only in special cases would the PR be opened against other branches.) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...
Github user misutoth commented on the issue: https://github.com/apache/spark/pull/20618 @felixcheung, I have started a mail thread on d...@spark.apache.org with title _Help needed in R documentation generation_ because I did not feel it is directly related to this PR. Thanks for your reply on this thread already. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #17466: [SPARK-14681][ML] Added getter for impurityStats
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/17466 @shaynativ Sorry for the inactivity here. Btw, for the JIRA & PR title question above, I'd recommend checking out http://spark.apache.org/contributing.html Since @WeichenXu123 opened a fresh PR for this, would you mind working with him on it? We can close this issue / PR for now. Thank you! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21057 It would be helpful if you open a JIRA and describe the issue. It could help other guys think a better way to test or would give clearer ideas to see if it's really difficult to add a test. Usually, JIRA is made first. See also https://spark.apache.org/contributing.html. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20897: [MINOR][DOC] Fix a few markdown typos
Github user Lemonjing commented on a diff in the pull request: https://github.com/apache/spark/pull/20897#discussion_r178481268 --- Diff: docs/mllib-pmml-model-export.md --- @@ -7,15 +7,15 @@ displayTitle: PMML model export - RDD-based API * Table of contents {:toc} -## `spark.mllib` supported models --- End diff -- backquotes in mds cause display problems (see http://spark.apache.org/docs/latest/mllib-pmml-model-export.html) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...
Github user sahilTakiar commented on the issue: https://github.com/apache/spark/pull/20893 Ok, I'll work on writing a test for `SparkLauncherSuite`. The test added here was meant to cover the race condition mentioned [here|https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#bucketing-sorting-and-partitioning] --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos
Github user Lemonjing commented on the issue: https://github.com/apache/spark/pull/20889 @felixcheung Yes, I read the entire docs of Spark Mlib. There is no problem with other mds (I may not have seen it or other contributors solved it). This error is obvious, and it affects the link of the later issue, so i found it. [http://spark.apache.org/docs/latest/ml-guide.html#breaking-changes](url) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22852 I think these are good changes. In a separate PR for the versions-specific docs, we could add a similar note to https://spark.apache.org/docs/latest/spark-standalone.html as much of the security concern is around the standalone master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22840 @srowen . It's a documented feature. - http://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution I know that you're not against it, but Spark 2.4.0 had better respect the document. Are we going to document it; From Spark 2.4.0 source distribution, we cannot build it from the source.? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[jira] [Created] (SPARK-25933) Fix pstats reference for spark.python.profile.dump in configuration.md
Alex Hagerman created SPARK-25933: - Summary: Fix pstats reference for spark.python.profile.dump in configuration.md Key: SPARK-25933 URL: https://issues.apache.org/jira/browse/SPARK-25933 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.3.2 Reporter: Alex Hagerman Fix For: 2.3.2 ptats.Stats() should be pstats.Stats() in https://spark.apache.org/docs/latest/configuration.html for spark.python.profile.dump. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[GitHub] spark issue #22822: [SPARK-25678] Requesting feedback regarding a prototype ...
Github user UtkarshMe commented on the issue: https://github.com/apache/spark/pull/22822 I did send the proposal on d...@spark.apache.org mailing list (twice). But unfortunately, I got no response so I opened a JIRA ticket about it about 20 days back and now opened a pull request for feedback. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[jira] [Created] (SPARK-25991) Update binary for 2.4.0 release
Vladimir Tsvetkov created SPARK-25991: - Summary: Update binary for 2.4.0 release Key: SPARK-25991 URL: https://issues.apache.org/jira/browse/SPARK-25991 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 2.4.0 Reporter: Vladimir Tsvetkov Archive with 2.4.0 release contains old binaries https://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22606 You mean http://spark.apache.org/versioning-policy.html and the reference to 2.4? I think that's still valid. When 2.4 is released, I'd propose to change that to refer to 3.0 being released .. I dunno .. around Mar 2019? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...
Github user niofire commented on the issue: https://github.com/apache/spark/pull/22593 From https://spark.apache.org/docs/2.3.2/api/java/org/apache/spark/sql/streaming/DataStreamWriter.html ![image](https://user-images.githubusercontent.com/2295469/46749482-b3351400-cc6a-11e8-834d-7eb53b70ddc0.png) I see java in that URL, is that actually referring to the java API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21755: Doc fix: The Imputer is an Estimator
Github user zoltanctoth commented on the issue: https://github.com/apache/spark/pull/21755 @srowen Just about to submit a new doc relates pull request. Wondering if your `PS see https://spark.apache.org/contributing.html` line referred to anything specific about how I should issue these PRs differently? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22321: [DOC] Update the 'Specifying the Hadoop Version' link in...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22321 Good catch. IIUC, the following files also have the similar problem regarding `http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn`. Would it be possible to address them? ``` R/WINDOWS.md R/README.md ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org