spark.apache.org

Mailing lists matching spark.apache.org

commits spark.apache.org
dev spark.apache.org
issues spark.apache.org
reviews spark.apache.org
user spark.apache.org

Re: Is storage resources counted during the scheduling

2016-04-11 Thread Jialin Liu

Thanks Ted, 
but that page seems to be scheduling policy, I have no idea of what resources 
are considered in the scheduling. 

And for scheduling, I’m wondering, in case of just one application, is there 
also a scheduling process?
otherwise, why I see some launching delay in the tasks. (well, this might be 
another question). Thanks. 

Best,
Jialin
> On Apr 11, 2016, at 3:18 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> 
> See 
> https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application
>  
> <https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application>
> 
> On Mon, Apr 11, 2016 at 3:15 PM, Jialin Liu <jaln...@lbl.gov 
> <mailto:jaln...@lbl.gov>> wrote:
> Hi Spark users/experts,
> 
> I’m wondering how does the Spark scheduler work?
> What kind of resources will be considered during the scheduling, does it 
> include the disk resources or I/O resources, e.g., number of IO ports.
> Is network resources considered in that?
> 
> My understanding is that only CPU is considered, right?
> 
> Best,
> Jialin
> -----
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
> 
>

Re: Where to set properties for the retainedJobs/Stages?

2016-04-01 Thread Max Schmidt

Yes but doc doesn't say any word for which variable the configs are 
valid, so do I have to set them for the history-server? The daemon? The 
workers?


And what if I use the java API instead of spark-submit for the jobs?

I guess that the spark-defaults.conf are obsolete for the java API?


Am 2016-04-01 18:58, schrieb Ted Yu:

You can set them in spark-defaults.conf

See 
also https://spark.apache.org/docs/latest/configuration.html#spark-ui 
[1]


On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io> wrote:


Can somebody tell me the interaction between the properties:

spark.ui.retainedJobs
spark.ui.retainedStages
spark.history.retainedApplications

I know from the bugtracker, that the last one describes the number 
of

applications the history-server holds in memory.

Can I set the properties in the spark-env.sh? And where?


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




Links:
--
[1] https://spark.apache.org/docs/latest/configuration.html#spark-ui





-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: SPARK-13843 and future of streaming backends

2016-03-28 Thread Cody Koeninger

Are you talking about group/identifier name, or contained classes?

Because there are plenty of org.apache.* classes distributed via maven
with non-apache group / identifiers.

On Fri, Mar 25, 2016 at 6:54 PM, David Nalley <ke4...@apache.org> wrote:
>
>> As far as group / artifact name compatibility, at least in the case of
>> Kafka we need different artifact names anyway, and people are going to
>> have to make changes to their build files for spark 2.0 anyway.   As
>> far as keeping the actual classes in org.apache.spark to not break
>> code despite the group name being different, I don't know whether that
>> would be enforced by maven central, just looked at as poor taste, or
>> ASF suing for trademark violation :)
>
>
> Sonatype, has strict instructions to only permit org.apache.* to originate 
> from repository.apache.org. Exceptions to that must be approved by VP, 
> Infrastructure.
> --
> Sent via Pony Mail for dev@spark.apache.org.
> View this email online at:
> https://pony-poc.apache.org/list.html?dev@spark.apache.org
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

[jira] [Commented] (SPARK-17560) SQLContext tables returns table names in lower case only

2016-09-16 Thread Aseem Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15495906#comment-15495906
 ] 

Aseem Bansal commented on SPARK-17560:
--

Looked through 
https://spark.apache.org/docs/2.0.0/sql-programming-guide.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/Dataset.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/sql/SparkSession.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/SparkConf.html

and none of them say anything about this parameter

> SQLContext tables returns table names in lower case only
> 
>
> Key: SPARK-17560
> URL: https://issues.apache.org/jira/browse/SPARK-17560
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Aseem Bansal
>
> I registered a table using
> dataSet.createOrReplaceTempView("TestTable");
> Then I tried to get the list of tables using 
> sparkSession.sqlContext().tableNames()
> but the name that I got was testtable. It used to give table names in proper 
> case in Spark 1.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-----
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Spark streaming completed batches statistics

2016-12-07 Thread Richard Startin

Ok it looks like I could reconstruct the logic in the Spark UI from the /jobs 
resource. Thanks.


https://richardstartin.com/



From: map reduced <k3t.gi...@gmail.com>
Sent: 07 December 2016 19:49
To: Richard Startin
Cc: user@spark.apache.org
Subject: Re: Spark streaming completed batches statistics

Have you checked http://spark.apache.org/docs/latest/monitoring.html#rest-api ?

KP

On Wed, Dec 7, 2016 at 11:43 AM, Richard Startin 
<richardstar...@outlook.com<mailto:richardstar...@outlook.com>> wrote:

Is there any way to get this information as CSV/JSON?


https://docs.databricks.com/_images/CompletedBatches.png

[https://docs.databricks.com/_images/CompletedBatches.png]


https://richardstartin.com/



From: Richard Startin 
<richardstar...@outlook.com<mailto:richardstar...@outlook.com>>
Sent: 05 December 2016 15:55
To: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Spark streaming completed batches statistics

Is there any way to get a more computer friendly version of the completes 
batches section of the streaming page of the application master? I am very 
interested in the statistics and am currently screen-scraping...

https://richardstartin.com
-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>

Re: EXT: Multiple cores/executors in Pyspark standalone mode

2017-03-24 Thread Kadam, Gangadhar (GE Aviation, Non-GE)

In Local Mode  all processes are executed inside a single JVM.
Application is started in a local mode by setting master to local, local[*] or 
local[n].
spark.executor.cores and spark.executor.cores are not applicable in the local 
mode because there is only one embedded executor.


In Standalone mode, you need  standalone Spark 
cluster<https://spark.apache.org/docs/latest/spark-standalone.html>.

It requires a master node (can be started using SPARK_HOME/sbin/start-master.sh 
script) and at least one worker node (can be started using 
SPARK_HOME/sbin/start-slave.sh script).SparkConf should use master node address 
to create (spark://host:port)

Thanks!

Gangadhar
From: Li Jin <ice.xell...@gmail.com<mailto:ice.xell...@gmail.com>>
Date: Friday, March 24, 2017 at 3:43 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: EXT: Multiple cores/executors in Pyspark standalone mode

Hi,

I am wondering does pyspark standalone (local) mode support multi 
cores/executors?

Thanks,
Li

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Welcoming Hyukjin Kwon and Sameer Agarwal as committers

2017-08-07 Thread 王长捷

Congratulations, Hyukjin and Sameer!

> 在 2017年8月8日，00:01，蒋星博 <jiangxb1...@gmail.com> 写道：
> 
> Congratulation, Hyukjin and Sameer!
> 
> 2017-08-07 23:57 GMT+08:00 <linguin@gmail.com 
> <mailto:linguin@gmail.com>>:
> Congrats!
> 
> 2017/08/08 0:55、Bai, Dave <dave.1@here.com <mailto:dave.1@here.com>> 
> のメッセージ:
> 
> > Congrats, leveled up!=)
> >
> >> On 8/7/17, 10:53 AM, "Matei Zaharia" <matei.zaha...@gmail.com 
> >> <mailto:matei.zaha...@gmail.com>> wrote:
> >>
> >> Hi everyone,
> >>
> >> The Spark PMC recently voted to add Hyukjin Kwon and Sameer Agarwal as
> >> committers. Join me in congratulating both of them and thanking them for
> >> their contributions to the project!
> >>
> >> Matei
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> >> <mailto:dev-unsubscr...@spark.apache.org>
> >>
> >
> >
> > -----
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> > <mailto:dev-unsubscr...@spark.apache.org>
> >
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> <mailto:dev-unsubscr...@spark.apache.org>
> 
>

[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919707
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -352,10 +353,10 @@ setMethod("write.ml", signature(object = 
"GBTClassificationModel", path = "chara
 #' model, \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #' For more details, see
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-regression}{
-#' Random Forest Regression} and
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-classifier}{
-#' Random Forest Classification}
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-
+#' regression}{Random Forest Regression} and
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#random-forest-
+#' classifier}{Random Forest Classification}
--- End diff --

links were checked


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919715
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -132,10 +132,10 @@ print.summary.decisionTree <- function(x) {
 #' Gradient Boosted Tree model, \code{predict} to make predictions on new 
data, and
 #' \code{write.ml}/\code{read.ml} to save/load fitted models.
 #' For more details, see
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-regression}{
-#' GBT Regression} and
-#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-tree-classifier}{
-#' GBT Classification}
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-
+#' tree-regression}{GBT Regression} and
+#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#gradient-boosted-
+#' tree-classifier}{GBT Classification}
--- End diff --

links were checked


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19290: [WIP][SPARK-22063][R] Upgrades lintr to latest co...

2017-09-20 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/19290#discussion_r139919722
  
--- Diff: R/pkg/R/mllib_tree.R ---
@@ -567,10 +569,10 @@ setMethod("write.ml", signature(object = 
"RandomForestClassificationModel", path
 #' model, \code{predict} to make predictions on new data, and 
\code{write.ml}/\code{read.ml} to
 #' save/load fitted models.
 #' For more details, see
    -#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-regression}{
-#' Decision Tree Regression} and
    -#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-classifier}{
-#' Decision Tree Classification}
    +#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-
+#' regression}{Decision Tree Regression} and
    +#' 
\href{http://spark.apache.org/docs/latest/ml-classification-regression.html#decision-tree-
+#' classifier}{Decision Tree Classification}
--- End diff --

links were checked


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-14 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19485
  
Thanks for taking a look for this one. Actually, I thought we should add a 
chapter like 
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets

And, add a link to, for example, 
https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv
 for Python, 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader@csv(paths:String*):org.apache.spark.sql.DataFrame
 for Scala and 
http://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameReader.html#csv-scala.collection.Seq-
 for Java to refer the options, rather than duplicating the option list (which 
we should duplicately update when we fix or add options).

Probably, we should add some links to JSON ones too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

RE: Adding Custom finalize method to RDDs.

2019-06-11 Thread Nasrulla Khan Haris

I want to delete some files which I created In my datasource api,  as soon as 
the RDD is cleaned up.

Thanks,
Nasrulla

From: Vinoo Ganesh 
Sent: Monday, June 10, 2019 1:32 PM
To: Nasrulla Khan Haris ; 
dev@spark.apache.org
Subject: Re: Adding Custom finalize method to RDDs.

Generally overriding the finalize() method is an antipattern (it was in fact 
deprecated in java 11  
https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Object.html#finalize())
 . What’s the use case here?

From: Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.INVALID>>
Date: Monday, June 10, 2019 at 15:44
To: "dev@spark.apache.org<mailto:dev@spark.apache.org>" 
mailto:dev@spark.apache.org>>
Subject: RE: Adding Custom finalize method to RDDs.

Hello Everyone,
Is there a way  to do it from user-code ?

Thanks,
Nasrulla

From: Nasrulla Khan Haris 
mailto:nasrulla.k...@microsoft.com.INVALID>>
Sent: Sunday, June 9, 2019 5:30 PM
To: dev@spark.apache.org<mailto:dev@spark.apache.org>
Subject: Adding Custom finalize method to RDDs.

Hi All,

Is there a way to add custom finalize method to RDD objects to add custom logic 
when RDDs are destructed by JVM ?

Thanks,
Nasrulla

[GitHub] [spark] wangyum commented on issue #25542: [SPARK-28840][SQL] conf.getClassLoader in SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is created by Hive

2019-08-21 Thread GitBox

wangyum commented on issue #25542: [SPARK-28840][SQL] conf.getClassLoader in 
SparkSQLCLIDriver should be avoided as it returns the UDFClassLoader which is 
created by Hive
URL: https://github.com/apache/spark/pull/25542#issuecomment-523528243
 
 
   Our example always uses `--jars one.jar,two.jar`. It seems 
`--jars=one.jar,two.jar` is not a standard usage.
   
   
http://spark.apache.org/docs/latest/spark-standalone.html#launching-spark-applications
   http://spark.apache.org/docs/latest/running-on-yarn.html#adding-other-jars
   
http://spark.apache.org/docs/latest/rdd-programming-guide.html#using-the-shell
   http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] aof00 opened a new pull request #30376: change 'spark.sql.adaptive.skewedPartitionThresholdInBytes' to 'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes' #SPARK-33451

2020-11-14 Thread GitBox



aof00 opened a new pull request #30376:
URL: https://github.com/apache/spark/pull/30376


   JIRA Issue: https://issues.apache.org/jira/browse/SPARK-33451
   
   In the 'Optimizing Skew Join' section of the following two pages:
   1. 
[https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html](https://spark.apache.org/docs/3.0.0/sql-performance-tuning.html)
   2. 
[https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html](https://spark.apache.org/docs/3.0.1/sql-performance-tuning.html)
   
   The configuration 'spark.sql.adaptive.skewedPartitionThresholdInBytes' 
should be changed to 
'spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes', The former is 
missing the 'skewJoin'.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request #31525: [3.1][INFRA][DOC] Change the facetFilters of Docsearch to 3.1.1

2021-02-08 Thread GitBox



gengliangwang opened a new pull request #31525:
URL: https://github.com/apache/spark/pull/31525


   
   
   ### What changes were proposed in this pull request?
   
   As https://github.com/algolia/docsearch-configs/pull/3391 is merged, This PR 
changes the facetFilters of Docsearch as 3.1.1.
   
   ### Why are the changes needed?
   
   So that the search result of the published Spark site will points to 
https://spark.apache.org/docs/3.1.1 instead of 
https://spark.apache.org/docs/latest/. 
   This is useful for searching the docs of 3.1.1 after there are new Spark 
releases.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the search result of 3.1.1 Spark doc site is based on 
https://spark.apache.org/docs/3.1.1 instead of 
https://spark.apache.org/docs/latest/
   
   ### How was this patch tested?
   
   Just configuration changes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-09 Thread dch nguyen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441494#comment-17441494
 ] 

dch nguyen edited comment on SPARK-37260 at 11/10/21, 4:03 AM:
---

ping [~hyukjin.kwon] , is this issue resolved by 
[#34475|https://github.com/apache/spark/pull/34475]?


was (Author: dchvn):
[~hyukjin.kwon] , is this issue resolved by 
[#34475|https://github.com/apache/spark/pull/34475]?

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>    Priority: Major
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-09 Thread dch nguyen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17441494#comment-17441494
 ] 

dch nguyen commented on SPARK-37260:


[~hyukjin.kwon] , is this issue resolved by 
[#34475|https://github.com/apache/spark/pull/34475]?

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>    Priority: Major
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-10 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17442044#comment-17442044
 ] 

Hyukjin Kwon commented on SPARK-37260:
--

oh yeah. that's fixed via #34475. There are some more ongoing issues on the 
docs. I will fix them up and probably we could initiate spark 3.2.1.

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>    Priority: Major
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37260.
--
Resolution: Fixed

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>    Priority: Major
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37260:
-
Fix Version/s: 3.2.1

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Priority: Major
>     Fix For: 3.2.1
>
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-----
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-09 Thread Thomas Graves (Jira)

Thomas Graves created SPARK-37260:
-

 Summary: PYSPARK Arrow 3.2.0 docs link invalid
 Key: SPARK-37260
 URL: https://issues.apache.org/jira/browse/SPARK-37260
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 3.2.0
Reporter: Thomas Graves


[http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]

links to:

[https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]

which links to:

[https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]

But that is an invalid link.

I assume its supposed to point to:

https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #43011: [WIP][SPARK-45232][DOCS] Add missing function groups to SQL references

2023-09-20 Thread via GitHub



zhengruifeng commented on PR #43011:
URL: https://github.com/apache/spark/pull/43011#issuecomment-1728581850

   @allisonwang-db I am not sure, I don't see document for FROM clause, you may 
check 3 places:
   
   - https://spark.apache.org/docs/latest/api/sql/index.html#explode
   - 
https://spark.apache.org/docs/latest/sql-ref-functions-builtin.html#generator-functions
   - https://spark.apache.org/docs/latest/sql-ref-syntax.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zr-msft commented on pull request #35561: [MINOR][DOCS] Fixed closing tags in running-on-kubernetes.md

2022-06-07 Thread GitBox



zr-msft commented on PR #35561:
URL: https://github.com/apache/spark/pull/35561#issuecomment-1148923071

   @dongjoon-hyun I've periodically checked the docs site and I'm not seeing 
any changes show up based on commits i've added from this PR:
   * 
https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration
   * 
https://spark.apache.org/docs/3.2.1/running-on-kubernetes.html#configuration
   
   I'm also not seeing earlier commits show up:
   * 
https://github.com/apache/spark/commit/302cb2257b66642cd3de0f61a700293b8ac7b000
   * 
https://github.com/apache/spark/commit/476214bc1cc813f0a2332bee53dfc7248ebd2a66
   
   The most recent commit that shows up on this page is from Jul 18, 2021:  
   * 
https://github.com/apache/spark/commit/eea69c122f20577956c4a87a6d8eb59943c1c6f0 
-- https://spark.apache.org/docs/latest/running-on-kubernetes.html#prerequisites
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #38470: [CONNECT] [DOC] Defining Spark Connect Client Connection String

2022-11-01 Thread GitBox



HyukjinKwon commented on PR #38470:
URL: https://github.com/apache/spark/pull/38470#issuecomment-1299418917

   Maybe it's better to have a JIRA. BTW, wonder if we have an e2e example for 
users can copy and paste to try. (e.g., like most of docs in 
https://spark.apache.org/docs/latest/index.html). Another decision to make is 
if we should document it in PySpark docs 
(https://spark.apache.org/docs/latest/api/python/getting_started/index.html) or 
Spark main page (https://spark.apache.org/docs/latest/index.html)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[jira] [Updated] (CALCITE-6241) Add a few existing functions to Spark library

2024-02-03 Thread EveyWu (Jira)



 [ 
https://issues.apache.org/jira/browse/CALCITE-6241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

EveyWu updated CALCITE-6241:

Description: 
Add Spark as a supported library for functions that have already been 
implemented for other libraries.

Spark Functions 
Link：[https://spark.apache.org/docs/latest/api/sql/index.html|https://spark.apache.org/docs/latest/api/sql/index.html#rtrim]


Add function List：
 * DECODE

 

 

 

  was:
Add Spark as a supported library for functions that have already been 
implemented for other libraries.

Spark Functions 
Link：[https://spark.apache.org/docs/latest/api/sql/index.html|https://spark.apache.org/docs/latest/api/sql/index.html#rtrim]

 

 

 


> Add a few existing functions to Spark library
> -
>
> Key: CALCITE-6241
> URL: https://issues.apache.org/jira/browse/CALCITE-6241
> Project: Calcite
>  Issue Type: Improvement
>Reporter:  EveyWu
>Priority: Minor
>
> Add Spark as a supported library for functions that have already been 
> implemented for other libraries.
> Spark Functions 
> Link：[https://spark.apache.org/docs/latest/api/sql/index.html|https://spark.apache.org/docs/latest/api/sql/index.html#rtrim]
> Add function List：
>  * DECODE
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: acquire and give back resources dynamically

2014-08-16 Thread fireflyc

http://spark.apache.org/docs/latest/running-on-yarn.html
Spark just a Yarn application


 在 2014年8月14日，11:12，牛兆捷 nzjem...@gmail.com 写道：
 
 Dear all:
 
 Does spark can acquire resources from and give back resources to
 YARN dynamically ?
 
 
 -- 
 *Regards,*
 *Zhaojie*


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: spark won't build with maven

2014-08-15 Thread visakh

You are running a Continuous Compilation. AFAIK, it runs in an infinite loop
and will compile only the modified files. For compiling with maven, have a
look at these steps -
https://spark.apache.org/docs/latest/building-with-maven.html

Thanks,
Visakh



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-won-t-build-with-maven-tp12173p12176.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: LDA example?

2014-08-22 Thread Burak Yavuz

You can check out this pull request: https://github.com/apache/spark/pull/476

LDA is on the roadmap for the 1.2 release, hopefully we will officially support 
it then!

Best,
Burak

- Original Message -
From: Denny Lee denny.g@gmail.com
To: user@spark.apache.org
Sent: Thursday, August 21, 2014 10:10:35 PM
Subject: LDA example?

Quick question - is there a handy sample / example of how to use the LDA 
algorithm within Spark MLLib?  

Thanks!
Denny



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

RE: resize memory size for caching RDD

2014-09-03 Thread Liu, Raymond

AFAIK, No.

Best Regards,
Raymond Liu

From: 牛兆捷 [mailto:nzjem...@gmail.com] 
Sent: Thursday, September 04, 2014 11:30 AM
To: user@spark.apache.org
Subject: resize memory size for caching RDD

Dear all:

Spark uses memory to cache RDD and the memory size is specified by 
spark.storage.memoryFraction.

One the Executor starts, does Spark support adjusting/resizing memory size of 
this part dynamically?

Thanks.

-- 
Regards,
Zhaojie

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark SQL - Exception only when using cacheTable

2014-10-10 Thread poiuytrez

I am using the python api. Unfortunately, I cannot find the isCached method
equivalent in the documentation:
https://spark.apache.org/docs/1.1.0/api/python/index.html in the SQLContext
section.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Exception-only-when-using-cacheTable-tp16031p16137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

How to calculate percentiles with Spark?

2014-10-21 Thread sparkuser

Hi,

What would be the best way to get percentiles from a Spark RDD? I can see
JavaDoubleRDD or MLlib's  MultivariateStatisticalSummary
https://spark.apache.org/docs/latest/mllib-statistics.html   provide the
mean() but not percentiles.

Thank you!

Horace



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-calculate-percentiles-with-Spark-tp16937.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Rdd of Rdds

2014-10-22 Thread Michael Malak

On Wednesday, October 22, 2014 9:06 AM, Sean Owen so...@cloudera.com wrote:

 No, there's no such thing as an RDD of RDDs in Spark.
 Here though, why not just operate on an RDD of Lists? or a List of RDDs?
 Usually one of these two is the right approach whenever you feel
 inclined to operate on an RDD of RDDs.


Depending on one's needs, one could also consider the matrix (RDD[Vector]) 
operations provided by MLLib, such as
https://spark.apache.org/docs/latest/mllib-statistics.html

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Use RDD like a Iterator

2014-10-29 Thread Sean Owen

Call RDD.toLocalIterator()?

https://spark.apache.org/docs/latest/api/java/org/apache/spark/rdd/RDD.html

On Wed, Oct 29, 2014 at 4:15 AM, Dai, Kevin yun...@ebay.com wrote:
 Hi, ALL



 I have a RDD[T], can I use it like a iterator.

 That means I can compute every element of this RDD lazily.



 Best Regards,

 Kevin.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

StreamingLinearRegressionWithSGD

2014-12-01 Thread Joanne Contact

Hi Gurus,

I did not look at the code yet. I wonder if StreamingLinearRegressionWithSGD
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/StreamingLinearRegressionWithSGD.html

is equivalent to
LinearRegressionWithSGD
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/LinearRegressionWithSGD.htmlwith
starting weights of the current batch as the ending weights of the last
batch?

Since RidgeRegressionModel
http://spark.apache.org/docs/latest/api/java/org/apache/spark/mllib/regression/RidgeRegressionModel.html
does
not seem to have a streaming version, just wonder if this way will suffice.


Thanks!

J

[jira] [Created] (SPARK-5409) Broken link in documentation

2015-01-26 Thread Mauro Pirrone (JIRA)

Mauro Pirrone created SPARK-5409:


 Summary: Broken link in documentation
 Key: SPARK-5409
 URL: https://issues.apache.org/jira/browse/SPARK-5409
 Project: Spark
  Issue Type: Documentation
Reporter: Mauro Pirrone
Priority: Minor


https://spark.apache.org/docs/1.2.0/streaming-kafka-integration.html

See the API docs and the example.

Link to example is broken.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Upgrade to Spark 1.2.1 using Guava

2015-02-28 Thread Pat Ferrel

Maybe but any time the work around is to use spark-submit --conf 
spark.executor.extraClassPath=/guava.jar blah” that means that standalone apps 
must have hard coded paths that are honored on every worker. And as you know a 
lib is pretty much blocked from use of this version of Spark—hence the blocker 
severity.

I could easily be wrong but userClassPathFirst doesn’t seem to be the issue. 
There is no class conflict.

On Feb 27, 2015, at 7:13 PM, Sean Owen so...@cloudera.com wrote:

This seems like a job for userClassPathFirst. Or could be. It's
definitely an issue of visibility between where the serializer is and
where the user class is.

At the top you said Pat that you didn't try this, but why not?

On Fri, Feb 27, 2015 at 10:11 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I’ll try to find a Jira for it. I hope a fix is in 1.3
 
 
 On Feb 27, 2015, at 1:59 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 Thanks! that worked.
 
 On Feb 27, 2015, at 1:50 PM, Pat Ferrel p...@occamsmachete.com wrote:
 
 I don’t use spark-submit I have a standalone app.
 
 So I guess you want me to add that key/value to the conf in my code and make 
 sure it exists on workers.
 
 
 On Feb 27, 2015, at 1:47 PM, Marcelo Vanzin van...@cloudera.com wrote:
 
 On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
 I changed in the spark master conf, which is also the only worker. I added a 
 path to the jar that has guava in it. Still can’t find the class.
 
 Sorry, I'm still confused about what config you're changing. I'm
 suggesting using:
 
 spark-submit --conf spark.executor.extraClassPath=/guava.jar blah
 
 
 --
 Marcelo
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 
 
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: SparkStream saveAsTextFiles()

2015-05-04 Thread anavidad

Structure seems fine. Only need to type at the end of your program:

ssc.start();
ssc.awaitTermination();

Check method arguments. I advise you to check the spark java api streaming.

https://spark.apache.org/docs/1.3.0/api/java/

Regards.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkStream-saveAsTextFiles-tp22719p22755.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: spark sql, creating literal columns in java.

2015-05-05 Thread Michael Armbrust

This should work from java too:
http://spark.apache.org/docs/1.3.1/api/java/index.html#org.apache.spark.sql.functions$

On Tue, May 5, 2015 at 4:15 AM, Jan-Paul Bultmann janpaulbultm...@me.com
wrote:

 Hey,
 What is the recommended way to create literal columns in java?
 Scala has the `lit` function from  `org.apache.spark.sql.functions`.
 Should it be called from java as well?

 Cheers jan

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org

Re: Performance tuning in Spark SQL.

2015-07-02 Thread prosp4300

Please see below link for the ways available
https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#performance-tuning

For example, reduce spark.sql.shuffle.partitions from 200 to 10 could
improve the performance significantly



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Performance-tuning-in-Spark-SQL-tp21871p23576.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to set environment of worker applications

2015-08-23 Thread Hemant Bhanawat

Check for spark.driver.extraJavaOptions and spark.executor.extraJavaOptions
in the following article. I think you can use -D to pass system vars:

spark.apache.org/docs/latest/configuration.html#runtime-environment
Hi,

I am starting a spark streaming job in standalone mode with spark-submit.

Is there a way to make the UNIX environment variables with which
spark-submit is started available to the processes started on the worker
nodes?

Jan
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Create RDD from output of unix command

2015-07-18 Thread Gylfi

You may want to look into using the pipe command .. 
http://blog.madhukaraphatak.com/pipe-in-spark/
http://spark.apache.org/docs/0.6.0/api/core/spark/rdd/PipedRDD.html




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Create-RDD-from-output-of-unix-command-tp23723p23895.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Python Kafka support?

2015-11-10 Thread Saisai Shao

Hi Darren,

Functionality like messageHandler is missing in python API, still not
included in version 1.5.1.

Thanks
Jerry

On Wed, Nov 11, 2015 at 7:37 AM, Darren Govoni <dar...@ontrenet.com> wrote:

> Hi,
>  I read on this page
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html
> about python support for "receiverless" kafka integration (Approach 2) but
> it says its incomplete as of version 1.4.
>
> Has this been updated in version 1.5.1?
>
> Darren
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Ranger-like Security on Spark

2015-09-03 Thread Matei Zaharia

Even simple Spark-on-YARN should run as the user that submitted the job, yes, 
so HDFS ACLs should be enforced. Not sure how it plays with the rest of Ranger.

Matei

> On Sep 3, 2015, at 4:57 PM, Jörn Franke <jornfra...@gmail.com> wrote:
> 
> Well if it needs to read from hdfs then it will adhere to the permissions 
> defined there And/or in ranger. However, I am not aware that you can protect 
> dataframes, tables or streams in general in Spark.
> 
> Le jeu. 3 sept. 2015 à 21:47, Daniel Schulz <danielschulz2...@hotmail.com 
> <mailto:danielschulz2...@hotmail.com>> a écrit :
> Hi Matei,
> 
> Thanks for your answer.
> 
> My question is regarding simple authenticated Spark-on-YARN only, without 
> Kerberos. So when I run Spark on YARN and HDFS, Spark will pass through my 
> HDFS user and only be able to access files I am entitled to read/write? Will 
> it enforce HDFS ACLs and Ranger policies as well?
> 
> Best regards, Daniel.
> 
> > On 03 Sep 2015, at 21:16, Matei Zaharia <matei.zaha...@gmail.com 
> > <mailto:matei.zaha...@gmail.com>> wrote:
> >
> > If you run on YARN, you can use Kerberos, be authenticated as the right 
> > user, etc in the same way as MapReduce jobs.
> >
> > Matei
> >
> >> On Sep 3, 2015, at 1:37 PM, Daniel Schulz <danielschulz2...@hotmail.com 
> >> <mailto:danielschulz2...@hotmail.com>> wrote:
> >>
> >> Hi,
> >>
> >> I really enjoy using Spark. An obstacle to sell it to our clients 
> >> currently is the missing Kerberos-like security on a Hadoop with simple 
> >> authentication. Are there plans, a proposal, or a project to deliver a 
> >> Ranger plugin or something similar to Spark. The target is to 
> >> differentiate users and their privileges when reading and writing data to 
> >> HDFS? Is Kerberos my only option then?
> >>
> >> Kind regards, Daniel.
> >> -----
> >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> >> <mailto:user-unsubscr...@spark.apache.org>
> >> For additional commands, e-mail: user-h...@spark.apache.org 
> >> <mailto:user-h...@spark.apache.org>
> >
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> > <mailto:user-unsubscr...@spark.apache.org>
> > For additional commands, e-mail: user-h...@spark.apache.org 
> > <mailto:user-h...@spark.apache.org>
> >
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> <mailto:user-unsubscr...@spark.apache.org>
> For additional commands, e-mail: user-h...@spark.apache.org 
> <mailto:user-h...@spark.apache.org>
>

[jira] [Created] (SPARK-12661) Drop Python 2.6 support in PySpark

2016-01-05 Thread Davies Liu (JIRA)

Davies Liu created SPARK-12661:
--

 Summary: Drop Python 2.6 support in PySpark
 Key: SPARK-12661
 URL: https://issues.apache.org/jira/browse/SPARK-12661
 Project: Spark
  Issue Type: Task
Reporter: Davies Liu


1. stop testing with 2.6
2. remove the code for python 2.6


see discussion : 
https://www.mail-archive.com/user@spark.apache.org/msg43423.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15966) Fix markdown for Spark Monitoring

2016-06-15 Thread Dhruve Ashar (JIRA)

Dhruve Ashar created SPARK-15966:


 Summary: Fix markdown for Spark Monitoring
 Key: SPARK-15966
 URL: https://issues.apache.org/jira/browse/SPARK-15966
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.0.0
Reporter: Dhruve Ashar
Priority: Trivial


The markdown for Spark monitoring needs to be fixed. 
http://spark.apache.org/docs/2.0.0-preview/monitoring.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Re: Does filter on an RDD scan every data item ?

2016-01-23 Thread nir

Looks like this has been supported from 1.4 release :) 
https://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p26049.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: udf StructField to JSON String

2016-03-11 Thread Tristan Nixon

Have you looked at DataFrame.write.json( path )?
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameWriter

> On Mar 11, 2016, at 7:15 AM, Caires Vinicius <caire...@gmail.com> wrote:
> 
> I have one DataFrame with nested StructField and I want to convert to JSON 
> String. There is anyway to accomplish this?

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark ML Interaction

2016-03-08 Thread Nick Pentreath

Could you create a JIRA to add an example and documentation?

Thanks

On Tue, 8 Mar 2016 at 16:18, amarouni <amaro...@talend.com> wrote:

> Hi,
>
> Did anyone here manage to write an example of the following ML feature
> transformer
>
> http://spark.apache.org/docs/latest/api/java/org/apache/spark/ml/feature/Interaction.html
> ?
> It's not documented on the official Spark ML features pages but it can
> be found in the package API javadocs.
>
> Thanks,
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: please add Christchurch Apache Spark Meetup Group

2016-03-02 Thread Sean Owen

(I have the site's svn repo handy, so I just added it.)


On Wed, Mar 2, 2016 at 5:16 PM, Raazesh Sainudiin
<raazesh.sainud...@gmail.com> wrote:
> Hi,
>
> Please add Christchurch Apache Spark Meetup Group to the community list
> here:
> http://spark.apache.org/community.html
>
> Our Meetup URI is:
> http://www.meetup.com/Christchurch-Apache-Spark-Meetup/
>
> Thanks,
> Raaz

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Where to set properties for the retainedJobs/Stages?

2016-04-01 Thread Ted Yu

You can set them in spark-defaults.conf

See also https://spark.apache.org/docs/latest/configuration.html#spark-ui

On Fri, Apr 1, 2016 at 8:26 AM, Max Schmidt <m...@datapath.io> wrote:

> Can somebody tell me the interaction between the properties:
>
> spark.ui.retainedJobs
> spark.ui.retainedStages
> spark.history.retainedApplications
>
> I know from the bugtracker, that the last one describes the number of
> applications the history-server holds in memory.
>
> Can I set the properties in the spark-env.sh? And where?
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: yarn-cluster

2016-05-04 Thread nsalian

Hi,

this is a good spot to start for Spark and YARN.
https://spark.apache.org/docs/1.5.0/running-on-yarn.html

specific to the version you are on, you can toggle between pages.



-
Neelesh S. Salian
Cloudera
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/yarn-cluster-tp26846p26882.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

[jira] [Resolved] (SPARK-15228) pyspark.RDD.toLocalIterator Documentation

2016-05-10 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15228.
---
Resolution: Not A Problem

> pyspark.RDD.toLocalIterator Documentation
> -
>
> Key: SPARK-15228
> URL: https://issues.apache.org/jira/browse/SPARK-15228
> Project: Spark
>  Issue Type: Documentation
>Reporter: Ignacio Tartavull
>Priority: Trivial
>
> There is a little bug in the parsing of the documentation of 
> http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.toLocalIterator



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-15228) pyspark.RDD.toLocalIterator Documentation

2016-05-09 Thread Ignacio Tartavull (JIRA)

Ignacio Tartavull created SPARK-15228:
-

 Summary: pyspark.RDD.toLocalIterator Documentation
 Key: SPARK-15228
 URL: https://issues.apache.org/jira/browse/SPARK-15228
 Project: Spark
  Issue Type: Documentation
Reporter: Ignacio Tartavull
Priority: Trivial


There is a little bug in the parsing of the documentation of 
http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.toLocalIterator





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[GitHub] spark issue #16816: Code style improvement

2017-02-06 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16816
  
@zhoucen  please close this PR and read 
http://spark.apache.org/contributing.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16863: Swamidass & Baldi Approximations

2017-02-08 Thread uncleGen

Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16863
  
Please review http://spark.apache.org/contributing.html before opening a 
pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16676: delete useless var “j”

2017-01-24 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16676
  
Merged to master. Please read http://spark.apache.org/contributing.html for 
next time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #8632: Update README.md

2017-02-16 Thread packtpartner

Github user packtpartner commented on the issue:

https://github.com/apache/spark/pull/8632
  
Hi @srowen , where is the Github repository to feature books on 
http://spark.apache.org/documentation.html ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16638: spark-19115

2017-01-19 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16638
  
Could you follow the title requirement in 
http://spark.apache.org/contributing.html?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: CSV escaping not working

2016-10-27 Thread Jain, Nishit

I’d think quoting is only necessary if you are not escaping delimiters in data. 
But we can only share our opinions. It would be good to see something 
documented.
This may be the cause of the issue?: 
https://issues.apache.org/jira/browse/CSV-135

From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:49 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: CSV escaping not working

well my expectation would be that if you have delimiters in your data you need 
to quote your values. if you now have quotes without your data you need to 
escape them.

so escaping is only necessary if quoted.

On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit 
<nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote:
Do you mind sharing why should escaping not work without quotes?

From: Koert Kuipers <ko...@tresata.com<mailto:ko...@tresata.com>>
Date: Thursday, October 27, 2016 at 12:40 PM
To: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: CSV escaping not working

that is what i would expect: escaping only works if quoted

On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit 
<nja...@underarmour.com<mailto:nja...@underarmour.com>> wrote:
Interesting finding: Escaping works if data is quoted but not otherwise.

From: "Jain, Nishit" <nja...@underarmour.com<mailto:nja...@underarmour.com>>
Date: Thursday, October 27, 2016 at 10:54 AM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: CSV escaping not working

I am using spark-core version 2.0.1 with Scala 2.11. I have simple code to read 
a csv file which has \ escapes.

val myDA = spark.read
  .option("quote",null)
.schema(mySchema)
.csv(filePath)

As per documentation \ is default escape for csv reader. But it does not work. 
Spark is reading \ as part of my data. For Ex: City column in csv file is north 
rocks\,au . I am expecting city column should read in code as northrocks,au. 
But instead spark reads it as northrocks\ and moves au to next column.

I have tried following but did not work:

  *   Explicitly defined escape .option("escape",”\\")
  *   Changed escape to | or : in file and in code
  *   I have tried using spark-csv library

Any one facing same issue? Am I missing something?

Thanks

RE: Spark SQL join and subquery

2016-11-17 Thread Sood, Anjali

unsubscribe

-Original Message-
From: neil90 [mailto:neilp1...@icloud.com] 
Sent: Thursday, November 17, 2016 8:26 AM
To: user@spark.apache.org
Subject: Re: Spark SQL join and subquery

What version of Spark are you using? I believe this was fixed in 2.0



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-join-and-subquery-tp28093p28097.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[GitHub] spark issue #17309: same rdd rule testcase

2017-03-16 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17309
  
See http://spark.apache.org/contributing.html
I'm not clear this adds any value?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17556: [SPARK-16957][MLlib] Use weighted midpoints for split va...

2017-04-11 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/17556
  
http://spark.apache.org/docs/latest/building-spark.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

RE: RDD functions using GUI

2017-04-18 Thread Ke Yang (Conan)

Ping... wonder why there aren't any such drag-n-drop GUI tool for creating 
batch query scripts?
Thanks

From: Ke Yang (Conan)
Sent: Monday, April 17, 2017 5:31 PM
To: 'dev@spark.apache.org' <dev@spark.apache.org>
Subject: RDD functions using GUI

Hi,
  Are there drag and drop GUI (code-free) for RDD functions available? i.e. a 
GUI that generates code based on drag-n-drops?
http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds

thanks for brainstorming

[GitHub] spark issue #18836: Update SortMergeJoinExec.scala

2017-08-03 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18836
  
You didn't read the link above, I take it?
 http://spark.apache.org/contributing.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18414: Update status of application to RUNNING if executors are...

2017-06-25 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/18414
  
please fix up the PR title: http://spark.apache.org/contributing.html



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

user-unsubscr...@spark.apache.org

2017-05-23 Thread williamtellme123

user-unsubscr...@spark.apache.org

From: 萝卜丝炒饭 [mailto:1427357...@qq.com] 
Sent: Sunday, May 21, 2017 8:15 PM
To: user <user@spark.apache.org>
Subject: Are tachyon and akka removed from 2.1.1 please

HI all,

Iread some paper about source code, the paper base on version 1.2.  they
refer the tachyon and akka.  When i read the 2.1code. I can not find the
code abiut akka and tachyon.

Are tachyon and akka removed from 2.1.1  please

[GitHub] spark issue #19238: [SPARK-22016][SQL] Add HiveDialect for JDBC connection t...

2017-09-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19238
  
I can see the value, but it does not perform well in most cases if we using 
JDBC connection. Instead of adding the extra dialect to upstream, could you 
please add Hive as a separate data source? Thanks!

https://spark.apache.org/third-party-projects.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19433: [SPARK-3162] [MLlib][WIP] Add local tree training for de...

2017-10-09 Thread smurching

Github user smurching commented on the issue:

https://github.com/apache/spark/pull/19433
  
Thanks! I'll remove the WIP. To clear things up for the future, I'd thought 
[WIP] was the appropriate tag for a PR that's ready for review but not ready to 
be merged (based on https://spark.apache.org/contributing.html) -- have we 
stopped using the WIP tag?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19429
  
When I opened a JIRA, I thought a chapter such as 
https://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets. 
This chapter, `Manually Specifying Options`, looks describing how to specify 
options BTW.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19154: Fix DiskBlockManager crashing when a root local folder h...

2017-09-07 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19154
  
I don't think it's reasonable to handle the case where people arbitrarily 
delete data from under Spark. This can may be easy to fix; others won't. This 
also isn't how changes are proposed: http://spark.apache.org/contributing.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19343: [SPARK-22121][SQL] Correct database location for namenod...

2017-09-26 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19343
  
@squito Thank you! 

Instead of changing the source codes, could we just update the document 
https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#hive-tables ? 
This might be enough for this issue. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: Welcoming Tejas Patil as a Spark committer

2017-09-30 Thread Kazuaki Ishizaki

Congratulation Tejas!

Kazuaki Ishizaki

From:   Matei Zaharia <matei.zaha...@gmail.com>
To: "dev@spark.apache.org" <dev@spark.apache.org>
Date:   2017/09/30 04:58
Subject:Welcoming Tejas Patil as a Spark committer

Hi all,

The Spark PMC recently added Tejas Patil as a committer on the
project. Tejas has been contributing across several areas of Spark for
a while, focusing especially on scalability issues and SQL. Please
join me in welcoming Tejas!

Matei

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

[GitHub] spark issue #19485: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-18 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19485
  
@jomach and @HyukjinKwon 

I did not generate the doc. I think we should follow what we did for JDBC. 
http://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases

List all the public options for each built-in data sources. Thus, it makes 
sense to add a new chapter for CSV



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19714: [SPARK-22489][SQL] Shouldn't change broadcast join build...

2017-11-30 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19714
  
LGTM 

Thanks! Merged to master. 

Could you submit a follow-up PR to document the behavior changes in 
migration section of Spark SQL? 
https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [01/51] [partial] spark-website git commit: 2.2.1 generated doc

2017-12-17 Thread Sean Owen

/latest does not point to 2.2.1 yet. Not all the pieces are released yet,
as I understand?

On Sun, Dec 17, 2017 at 8:12 AM Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I saw the following commit, but I can't seem to see 2.2.1 as the version
> in the header of the documentation pages under
> http://spark.apache.org/docs/latest/ (that is still 2.2.0). Is this being
> worked on?
>
> http://spark.apache.org/docs/2.2.1 is available and shows the proper
> version, but not http://spark.apache.org/docs/latest :(
>
> Pozdrawiam,
> Jacek Laskowski
> 
>
>

[GitHub] spark issue #19996: [MINOR][DOC] Fix the link of 'Getting Started'

2017-12-17 Thread mcavdar

Github user mcavdar commented on the issue:

https://github.com/apache/spark/pull/19996
  
@srowen 
[Here](https://github.com/mcavdar/NLP/blob/master/Broken%20Links/spark/spark_404links.txt)
 is all broken links, it may be useful. Each line contains broken link and 
parent page (separated by tab). About 75-100 broken links are related to 
"http(s)://spark.apache.org/docs/latest".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20219: [SPARK-23025][SQL] Support Null type in scala reflection

2018-01-11 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20219
  
`NullType` is not well supported in almost all the data sources. We did not 
mention it in our doc 
https://spark.apache.org/docs/latest/sql-programming-guide.html 

cc @cloud-fan @marmbrus @rxin @sameeragarwal Any comment about this support?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19290: [SPARK-22063][R] Fixes lint check failures in R by lates...

2018-01-09 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19290
  
BTW, I believe we are testing it with R 3.4.1 via AppVeyor too. I have been 
thinking it's good to test both old and new versions ...  I think we have a 
weak promise for `R 3.1+`  - 
http://spark.apache.org/docs/latest/index.html#downloading


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20334: How to check registered table name.

2018-01-20 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20334
  
Hey @AtulKumVerma, questions should go to mailing list usually. See 
http://spark.apache.org/community.html. I believe you can have a better answer 
from there.

Pull request from a branch to another branch actually causes a slight 
visual problem.

Mind closing this pull request please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20254: [SPARK-23062][SQL] Improve EXCEPT documentation

2018-01-16 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20254
  
@henryr Since Spark 2.3, Spark SQL documents all the behavior changes in 
[Migration 
Guides](https://spark.apache.org/docs/latest/sql-programming-guide.html#migration-guide).
 Hopefully, this can help our end users.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21961: Spark 20597

2018-08-02 Thread mahmoudmahdi24

Github user mahmoudmahdi24 commented on the issue:

https://github.com/apache/spark/pull/21961
  
Hello @Satyajitv, please rename the title of this PR properly. 
The PR title should be of the form [SPARK-][COMPONENT] Title, where 
SPARK- is the relevant JIRA number, COMPONENT is one of the PR categories 
shown at spark-prs.appspot.com and Title may be the JIRAâs title or a more 
specific title describing the PR itself.
Take a look to this helpful document : 
https://spark.apache.org/contributing.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22177: stages in wrong order within job page DAG chart

2018-08-22 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22177
  
Please change title to "[SPARK-25199][Web UI] XXX " as we described in 
http://spark.apache.org/contributing.html. 
```
check the DAG chart in job page.
```
Could you also put the DAG chart screenshot after your fix?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-15 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/21589
  
AFAIK, we always have num of executor and then num of core per executor 
right?
https://spark.apache.org/docs/latest/configuration.html#execution-behavior

maybe we should have the getter factored the same way and probably named 
similarly


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...

2018-09-05 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22339
  
Hi, @ScrapCodes . Could you do the followings?
- Update the title to `[SPARK-17159][SS]...`
- Remove `Please review http://spark.apache.org/contributing.html ` 
from PR description
- Share the numbers because the PR title has `Significant speed up`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22367: [SPARK-17916][SPARK-25241][SQL][FOLLOWUP] Fix empty stri...

2018-09-11 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22367
  
Usually we merge into master and backport to other branches when it's 
needed.

https://spark.apache.org/contributing.html

> 5. Open a pull request against the master branch of apache/spark. (Only 
in special cases would the PR be opened against other branches.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...

2018-02-27 Thread misutoth

Github user misutoth commented on the issue:

https://github.com/apache/spark/pull/20618
  
@felixcheung, I have started a mail thread on d...@spark.apache.org with 
title _Help needed in R documentation generation_ because I did not feel it is 
directly related to this PR. Thanks for your reply on this thread already.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17466: [SPARK-14681][ML] Added getter for impurityStats

2018-03-08 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/17466
  
@shaynativ Sorry for the inactivity here.  Btw, for the JIRA & PR title 
question above, I'd recommend checking out 
http://spark.apache.org/contributing.html

Since @WeichenXu123 opened a fresh PR for this, would you mind working with 
him on it?
We can close this issue / PR for now.  Thank you!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21057: [MINOR][PYTHON] 2 Improvements to Pyspark docs

2018-04-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21057
  
It would be helpful if you open a JIRA and describe the issue. It could 
help other guys think a better way to test or would give clearer ideas to see 
if it's really difficult to add a test. Usually, JIRA is made first. See also 
https://spark.apache.org/contributing.html.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20897: [MINOR][DOC] Fix a few markdown typos

2018-04-01 Thread Lemonjing

Github user Lemonjing commented on a diff in the pull request:

https://github.com/apache/spark/pull/20897#discussion_r178481268
  
--- Diff: docs/mllib-pmml-model-export.md ---
@@ -7,15 +7,15 @@ displayTitle: PMML model export - RDD-based API
 * Table of contents
 {:toc}
 
-## `spark.mllib` supported models
--- End diff --

backquotes in mds cause display problems (see 
http://spark.apache.org/docs/latest/mllib-pmml-model-export.html)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-26 Thread sahilTakiar

Github user sahilTakiar commented on the issue:

https://github.com/apache/spark/pull/20893
  
Ok, I'll work on writing a test for `SparkLauncherSuite`.

The test added here was meant to cover the race condition mentioned 
[here|https://spark.apache.org/docs/2.3.0/sql-programming-guide.html#bucketing-sorting-and-partitioning]


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20889: [MINOR][DOC] Fix ml-guide markdown typos

2018-03-23 Thread Lemonjing

Github user Lemonjing commented on the issue:

https://github.com/apache/spark/pull/20889
  
@felixcheung Yes, I read the entire docs of Spark Mlib. There is no problem 
with other mds (I may not have seen it or other contributors solved it). This 
error is obvious, and it affects the link of the later issue, so i found it.
[http://spark.apache.org/docs/latest/ml-guide.html#breaking-changes](url)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation

2018-10-30 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22852
  
I think these are good changes. In a separate PR for the versions-specific 
docs, we could add a similar note to 
https://spark.apache.org/docs/latest/spark-standalone.html as much of the 
security concern is around the standalone master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22840: [SPARK-25840][BUILD] `make-distribution.sh` should not f...

2018-10-25 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22840
  
@srowen . It's a documented feature.
- 
http://spark.apache.org/docs/latest/building-spark.html#building-a-runnable-distribution

I know that you're not against it, but Spark 2.4.0 had better respect the 
document.
Are we going to document it; From Spark 2.4.0 source distribution, we 
cannot build it from the source.?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[jira] [Created] (SPARK-25933) Fix pstats reference for spark.python.profile.dump in configuration.md

2018-11-03 Thread Alex Hagerman (JIRA)

Alex Hagerman created SPARK-25933:
-

 Summary: Fix pstats reference for spark.python.profile.dump in 
configuration.md
 Key: SPARK-25933
 URL: https://issues.apache.org/jira/browse/SPARK-25933
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 2.3.2
Reporter: Alex Hagerman
 Fix For: 2.3.2


ptats.Stats() should be pstats.Stats() in 
https://spark.apache.org/docs/latest/configuration.html for 
spark.python.profile.dump.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[GitHub] spark issue #22822: [SPARK-25678] Requesting feedback regarding a prototype ...

2018-10-25 Thread UtkarshMe

Github user UtkarshMe commented on the issue:

https://github.com/apache/spark/pull/22822
  
I did send the proposal on d...@spark.apache.org mailing list (twice). But 
unfortunately, I got no response so I opened a JIRA ticket about it about 20 
days back and now opened a pull request for feedback.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[jira] [Created] (SPARK-25991) Update binary for 2.4.0 release

2018-11-09 Thread Vladimir Tsvetkov (JIRA)

Vladimir Tsvetkov created SPARK-25991:
-

 Summary: Update binary for 2.4.0 release
 Key: SPARK-25991
 URL: https://issues.apache.org/jira/browse/SPARK-25991
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Vladimir Tsvetkov


Archive with 2.4.0 release contains old binaries 
https://spark.apache.org/downloads.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[GitHub] spark issue #22606: [SPARK-25592] Setting version to 3.0.0-SNAPSHOT

2018-10-02 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22606
  
You mean http://spark.apache.org/versioning-policy.html and the reference 
to 2.4? I think that's still valid. When 2.4 is released, I'd propose to change 
that to refer to 3.0 being released .. I dunno .. around Mar 2019?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...

2018-10-10 Thread niofire

Github user niofire commented on the issue:

https://github.com/apache/spark/pull/22593
  
From 
https://spark.apache.org/docs/2.3.2/api/java/org/apache/spark/sql/streaming/DataStreamWriter.html

![image](https://user-images.githubusercontent.com/2295469/46749482-b3351400-cc6a-11e8-834d-7eb53b70ddc0.png)

I see java in that URL, is that actually referring to the java API?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21755: Doc fix: The Imputer is an Estimator

2018-10-08 Thread zoltanctoth

Github user zoltanctoth commented on the issue:

https://github.com/apache/spark/pull/21755
  
@srowen Just about to submit a new doc relates pull request.
Wondering if your `PS see https://spark.apache.org/contributing.html` line 
referred to anything specific about how I should issue these PRs differently?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22321: [DOC] Update the 'Specifying the Hadoop Version' link in...

2018-09-03 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22321
  
Good catch. IIUC, the following files also have the similar problem 
regarding 
`http://spark.apache.org/docs/latest/building-spark.html#building-with-buildmvn`.
 Would it be possible to address them?

```
R/WINDOWS.md
R/README.md
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Mailing lists matching spark.apache.org

< 5 6 7 8 9 10 11 12 13 14 >

901 - 1000 of 1954528 matches

Mail list logo