[GitHub] [spark] yaooqinn commented on a diff in pull request #36052: [SPARK-38777][YARN] Add `bin/spark-submit --kill / --status` support for yarn

2022-04-04 Thread GitBox


yaooqinn commented on code in PR #36052:
URL: https://github.com/apache/spark/pull/36052#discussion_r841461428


##
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkSubmitOperation.scala:
##
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import scala.collection.Map
+
+import org.apache.hadoop.yarn.api.records.{ApplicationId, ApplicationReport, 
YarnApplicationState}
+import org.apache.hadoop.yarn.client.api.YarnClient
+import org.apache.hadoop.yarn.conf.YarnConfiguration
+
+import org.apache.spark.SparkConf
+import org.apache.spark.deploy.{SparkHadoopUtil, SparkSubmitOperation}
+import org.apache.spark.deploy.yarn.YarnSparkSubmitOperation._
+import org.apache.spark.util.CommandLineLoggingUtils
+
+class YarnSparkSubmitOperation
+  extends SparkSubmitOperation with CommandLineLoggingUtils {
+
+  private def withYarnClient(conf: SparkConf)(f: YarnClient => Unit): Unit = {
+val yarnClient = YarnClient.createYarnClient
+try {
+  val hadoopConf = new 
YarnConfiguration(SparkHadoopUtil.newConfiguration(conf))
+  yarnClient.init(hadoopConf)
+  yarnClient.start()
+  f(yarnClient)
+} catch {
+  case e: Exception =>
+printErrorAndExit(s"Failed to initialize yarn client due to 
${e.getMessage}")
+} finally {
+  yarnClient.stop()
+}
+  }
+
+  override def kill(applicationId: String, conf: SparkConf): Unit = {
+withYarnClient(conf) { yarnClient =>
+  try {
+val appId = ApplicationId.fromString(applicationId)
+val report = yarnClient.getApplicationReport(appId)
+if (isTerminalState(report.getYarnApplicationState)) {
+  printMessage(s"WARN: Application $appId is already terminated")
+  printMessage(formatReportDetails(report))
+} else {
+  yarnClient.killApplication(appId)
+  val report = yarnClient.getApplicationReport(appId)
+  printMessage(formatReportDetails(report))
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] monkeyboy123 commented on a diff in pull request #35984: [MINOR][SQL] Show debug log for `AnalysisException` in Analyzer

2022-04-04 Thread GitBox


monkeyboy123 commented on code in PR #35984:
URL: https://github.com/apache/spark/pull/35984#discussion_r841430645


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 try {
   innerResolve(expr, isTopLevel = true)
 } catch {
-  case _: AnalysisException if !throws => expr
+  case ae: AnalysisException if !throws =>
+logWarning(ae.message)

Review Comment:
   It seems that the unit tests errors in CI is not related to this pr



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36038: [SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-04 Thread GitBox


HyukjinKwon commented on code in PR #36038:
URL: https://github.com/apache/spark/pull/36038#discussion_r841351936


##
python/docs/source/reference/pyspark.ss.rst:
##
@@ -30,10 +30,10 @@ Core Classes
 
 DataStreamReader
 DataStreamWriter
-ForeachBatchFunction

Review Comment:
   This was removed because this isn't an API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35856: [SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI

2022-04-04 Thread GitBox


dongjoon-hyun commented on code in PR #35856:
URL: https://github.com/apache/spark/pull/35856#discussion_r841422549


##
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala:
##
@@ -118,6 +119,12 @@ private[sql] class SharedState(
 statusStore
   }
 
+  sparkContext.statusStore.diskStore.foreach { kvStore =>
+sparkContext.listenerBus.addToQueue(
+  new DiagnosticListener(conf, kvStore.asInstanceOf[ElementTrackingStore]),

Review Comment:
   Why do we need to share the same kvStore?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #35856: [SPARK-38550][SQL][CORE] Use a disk-based store to save more debug information for live UI

2022-04-04 Thread GitBox


dongjoon-hyun commented on code in PR #35856:
URL: https://github.com/apache/spark/pull/35856#discussion_r841421049


##
core/src/main/scala/org/apache/spark/internal/config/Status.scala:
##
@@ -70,4 +70,11 @@ private[spark] object Status {
   .version("3.0.0")
   .booleanConf
   .createWithDefault(false)
+
+  val DISK_STORE_DIR_FOR_STATUS =
+ConfigBuilder("spark.appStatusStore.diskStore.dir")

Review Comment:
   If there is no other config, Apache Spark community's configuration naming 
guide is not to introduce a namespace by removing `.`. In this case,
   ```
   - spark.appStatusStore.diskStore.dir
   + spark.appStatusStore.diskStoreDir
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] yaooqinn commented on a diff in pull request #36053: [SPARK-38778][INFRA][BUILD] Replace http with https for project url in pom

2022-04-03 Thread GitBox


yaooqinn commented on code in PR #36053:
URL: https://github.com/apache/spark/pull/36053#discussion_r841373575


##
pom.xml:
##
@@ -29,7 +29,7 @@
   3.4.0-SNAPSHOT
   pom
   Spark Project Parent POM
-  http://spark.apache.org/

Review Comment:
   updated, thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36038: [SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-03 Thread GitBox


HyukjinKwon commented on code in PR #36038:
URL: https://github.com/apache/spark/pull/36038#discussion_r841351936


##
python/docs/source/reference/pyspark.ss.rst:
##
@@ -30,10 +30,10 @@ Core Classes
 
 DataStreamReader
 DataStreamWriter
-ForeachBatchFunction

Review Comment:
   This was removed because this isn't an API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] wangyum commented on a diff in pull request #36053: [SPARK-38778][INFRA][BUILD] Replace http with https for project url in pom

2022-04-03 Thread GitBox


wangyum commented on code in PR #36053:
URL: https://github.com/apache/spark/pull/36053#discussion_r841345386


##
pom.xml:
##
@@ -29,7 +29,7 @@
   3.4.0-SNAPSHOT
   pom
   Spark Project Parent POM
-  http://spark.apache.org/

Review Comment:
   Line 53?
   
https://github.com/apache/spark/blob/629187e767d84f55fd849b0e8f09ad05a6f3f139/pom.xml#L53



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36038: [SPARK-38759][PYTHON][SS] Add StreamingQueryListener support in PySpark

2022-04-03 Thread GitBox


HyukjinKwon commented on code in PR #36038:
URL: https://github.com/apache/spark/pull/36038#discussion_r84177


##
python/pyspark/sql/streaming/listener.py:
##
@@ -0,0 +1,666 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+import pickle
+import uuid
+from typing import Optional, Dict, List
+from abc import ABC, abstractmethod
+
+from py4j.java_gateway import JavaObject
+
+from pyspark.sql import Row
+
+
+__all__ = ["StreamingQueryListener"]

Review Comment:
   So I intend to expose only `StreamingQueryListener` as a public API. Other 
events are NOT exposed as an API in PySpark for now. The events and progresses 
here are only supposed to be accessed as an instance.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] monkeyboy123 commented on a diff in pull request #35984: [MINOR] Log AnalysisException output for debug and tracing

2022-04-03 Thread GitBox


monkeyboy123 commented on code in PR #35984:
URL: https://github.com/apache/spark/pull/35984#discussion_r841321778


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##
@@ -1761,7 +1761,9 @@ class Analyzer(override val catalogManager: 
CatalogManager)
 try {
   innerResolve(expr, isTopLevel = true)
 } catch {
-  case _: AnalysisException if !throws => expr
+  case ae: AnalysisException if !throws =>
+logWarning(ae.message)

Review Comment:
   Thanks for review, updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox


HyukjinKwon commented on code in PR #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841300801


##
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala:
##
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
 hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, 
hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + 
EventLogFileWriter.COMPACTED

Review Comment:
   Is it because the string presentation of the path is able to omit the scheme 
of the URI, or it does not encode the special characters to the encoded chars? 
Actually, we would have to always use Hadoop's `Path` to work on Hadoop's 
`Path` so `fs.default.name` is respected.
   
   The only exception would be when this path is stored somewhere outside. In 
this case, the URI has to be stored as the fully-qualified one so different 
`fs.default.name` does not affect the original path.
   
   This path URI stuff has many weird holes so we're just delegating to 
Hadoop's `Path` implementation, and leveraging their bug fixes.
   
   I am fine to drop this if this change does not handle all of the cases but 
wanted to point out this direction is better than manually handling it in Spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #36048: [SPARK-38774][PYTHON] Implement Series.autocorr

2022-04-03 Thread GitBox


zhengruifeng commented on code in PR #36048:
URL: https://github.com/apache/spark/pull/36048#discussion_r841305123


##
python/pyspark/pandas/series.py:
##
@@ -2937,6 +2937,73 @@ def add_suffix(self, suffix: str) -> "Series":
 DataFrame(internal.with_new_sdf(sdf, index_fields=([None] * 
internal.index_level)))
 )
 
+def autocorr(self, periods: int = 1) -> float:
+"""
+Compute the lag-N autocorrelation.
+
+This method computes the Pearson correlation between
+the Series and its shifted self.
+
+Parameters
+--
+periods : int, default 1
+Number of lags to apply before performing autocorrelation.
+
+Returns
+---
+float
+The Pearson correlation between self and self.shift(lag).
+
+See Also
+
+Series.corr : Compute the correlation between two Series.
+Series.shift : Shift index by desired number of periods.
+DataFrame.corr : Compute pairwise correlation of columns.
+
+Notes
+-
+If the Pearson correlation is not well defined return 'NaN'.
+
+Examples
+
+>>> s = ps.Series([.2, .0, .6, .2, np.nan, .5, .6])
+>>> s.autocorr()  # doctest: +ELLIPSIS
+-0.141219...
+>>> s.autocorr(0)  # doctest: +ELLIPSIS
+1.0...
+>>> s.autocorr(2)  # doctest: +ELLIPSIS
+0.970725...
+>>> s.autocorr(-3)  # doctest: +ELLIPSIS
+0.277350...
+>>> s.autocorr(5)  # doctest: +ELLIPSIS
+-1.00...
+>>> s.autocorr(6)  # doctest: +ELLIPSIS
+nan
+
+If the Pearson correlation is not well defined, then 'NaN' is returned.
+
+>>> s = ps.Series([1, 0, 0, 0])
+>>> s.autocorr()
+nan
+"""
+# This implementation is suboptimal because it moves all data to a 
single partition,
+# global sort should be used instead of window, but it should be a 
start
+if not isinstance(periods, int):
+raise TypeError("periods should be an int; however, got [%s]" % 
type(periods).__name__)
+
+scol = self.spark.column.alias("__tmp_col__")
+if periods == 0:
+lag_col = scol.alias("__tmp_lag_col__")
+else:
+window = Window.orderBy(NATURAL_ORDER_COLUMN_NAME)
+lag_col = F.lag(scol, 
periods).over(window).alias("__tmp_lag_col__")
+
+return (
+self._internal.spark_frame.select([scol, lag_col])
+.dropna("any")
+.corr("__tmp_col__", "__tmp_lag_col__")

Review Comment:
   good point, will update soon



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a diff in pull request #36049: [SPARK-38775][ML] cleanup validation functions

2022-04-03 Thread GitBox


zhengruifeng commented on code in PR #36049:
URL: https://github.com/apache/spark/pull/36049#discussion_r841303261


##
mllib/src/main/scala/org/apache/spark/ml/util/DatasetUtils.scala:
##
@@ -138,4 +140,61 @@ private[spark] object DatasetUtils {
   case Row(point: Vector) => OldVectors.fromML(point)
 }
   }
+
+  /**
+   * Get the number of classes.  This looks in column metadata first, and if 
that is missing,
+   * then this assumes classes are indexed 0,1,...,numClasses-1 and computes 
numClasses
+   * by finding the maximum label value.
+   *
+   * Label validation (ensuring all labels are integers >= 0) needs to be 
handled elsewhere,
+   * such as in `extractLabeledPoints()`.
+   *
+   * @param dataset  Dataset which contains a column [[labelCol]]
+   * @param maxNumClasses  Maximum number of classes allowed when inferred 
from data.  If numClasses
+   *   is specified in the metadata, then maxNumClasses is 
ignored.
+   * @return  number of classes
+   * @throws IllegalArgumentException  if metadata does not specify 
numClasses, and the
+   *   actual numClasses exceeds maxNumClasses
+   */
+  private[ml] def getNumClasses(
+  dataset: Dataset[_],
+  labelCol: String,
+  maxNumClasses: Int = 100): Int = {
+MetadataUtils.getNumClasses(dataset.schema(labelCol)) match {
+  case Some(n: Int) => n
+  case None =>
+// Get number of classes from dataset itself.
+val maxLabelRow: Array[Row] = dataset
+  .select(max(checkClassificationLabels(labelCol, 
Some(maxNumClasses
+  .take(1)
+if (maxLabelRow.isEmpty || maxLabelRow(0).get(0) == null) {
+  throw new SparkException("ML algorithm was given empty dataset.")
+}
+val maxDoubleLabel: Double = maxLabelRow.head.getDouble(0)
+require((maxDoubleLabel + 1).isValidInt, s"Classifier found max label 
value =" +
+  s" $maxDoubleLabel but requires integers in range [0, ... 
${Int.MaxValue})")
+val numClasses = maxDoubleLabel.toInt + 1
+require(numClasses <= maxNumClasses, s"Classifier inferred $numClasses 
from label values" +
+  s" in column $labelCol, but this exceeded the max numClasses 
($maxNumClasses) allowed" +
+  s" to be inferred from values.  To avoid this error for labels with 
> $maxNumClasses" +
+  s" classes, specify numClasses explicitly in the metadata; this can 
be done by applying" +
+  s" StringIndexer to the label column.")
+logInfo(this.getClass.getCanonicalName + s" inferred $numClasses 
classes for" +
+  s" labelCol=$labelCol since numClasses was not specified in the 
column metadata.")
+numClasses
+}
+  }
+
+  /**
+   * Obtain the number of features in a vector column.
+   * If no metadata is available, extract it from the dataset.
+   */
+  private[ml] def getNumFeatures(dataset: Dataset[_], vectorCol: String): Int 
= {
+MetadataUtils.getNumFeatures(dataset.schema(vectorCol)) match {

Review Comment:
   ok, will swith back to getOrElse



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox


HyukjinKwon commented on code in PR #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841300801


##
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala:
##
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
 hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, 
hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + 
EventLogFileWriter.COMPACTED

Review Comment:
   Is it because the string presentation of the path is able to omit the scheme 
of the URI, or it does not encode the special characters to the encoded chars? 
Actually, we would have to always use Hadoop's `Path` to work on Hadoop's 
`Path` so `fs.default.name` is respected.
   
   The only exception would be when this path is stored somewhere outside. In 
this case, the URI has to be stored as the fully-qualified one so different 
`fs.default.name` does not affect the original path.
   
   This path URI stuff has many weird holes so we're just delegating to 
Hadoop's `Path` implementation, and leveraging their bug fixes.
   
   I am fine to drop this if this change does not handle all of the cases but 
wanted to point out this direction is better than manually handling it in Spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a diff in pull request #35979: [SPARK-38664][CORE] Support compact EventLog when there are illegal characters in the path

2022-04-03 Thread GitBox


HyukjinKwon commented on code in PR #35979:
URL: https://github.com/apache/spark/pull/35979#discussion_r841300801


##
core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala:
##
@@ -221,5 +221,5 @@ private class CompactedEventLogFileWriter(
 hadoopConf: Configuration)
   extends SingleEventLogFileWriter(appId, appAttemptId, logBaseDir, sparkConf, 
hadoopConf) {
 
-  override val logPath: String = originalFilePath.toUri.toString + 
EventLogFileWriter.COMPACTED

Review Comment:
   Is it because the string presentation of the path is able to omit the scheme 
of the URI, or it does not encode the special characters to the encoded chars? 
Actually, we would have to always use Hadoop's `Path` to work on Hadoop's 
`Path` so `fs.default.name` is respected.
   
   The only exception would be when this path is stored somewhere outside. In 
this case, the URI has to be stored as the fully-qualified one so different 
`fs.default.name` does not affect the original path.
   
   This path URI stuff has many weird holes so we're just delegating to 
Hadoop's `Path` implementation, and leveraging their bug fixes.
   
   I am fine if this change does not handle all of the cases but wanted to 
point out this direction is better than manually handling it in Spark.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] [spark] huaxingao commented on a diff in pull request #36043: [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source.

2022-04-03 Thread GitBox


huaxingao commented on code in PR #36043:
URL: https://github.com/apache/spark/pull/36043#discussion_r841285841


##
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala:
##
@@ -380,27 +380,32 @@ object V2ScanRelationPushDown extends Rule[LogicalPlan] 
with PredicateHelper wit
   sHolder.pushedLimit = Some(limit)
   sHolder.sortOrders = orders
   if (isPartiallyPushed) {
-s
+(s, isPartiallyPushed)
   } else {
-operation
+(operation, isPartiallyPushed)
   }
 } else {
-  s
+  (s, true)
 }
   } else {
-s
+(s, true)
   }
 case p: Project =>
-  val newChild = pushDownLimit(p.child, limit)
-  p.withNewChildren(Seq(newChild))
-case other => other
+  val (newChild, isPartiallyPushed) = pushDownLimit(p.child, limit)
+  (p.withNewChildren(Seq(newChild)), isPartiallyPushed)
+case other => (other, true)
   }
 
   def pushDownLimits(plan: LogicalPlan): LogicalPlan = plan.transform {
 case globalLimit @ Limit(IntegerLiteral(limitValue), child) =>
-  val newChild = pushDownLimit(child, limitValue)
-  val newLocalLimit = 
globalLimit.child.asInstanceOf[LocalLimit].withNewChildren(Seq(newChild))
-  globalLimit.withNewChildren(Seq(newLocalLimit))
+  val (newChild, isPartiallyPushed) = pushDownLimit(child, limitValue)
+  if (isPartiallyPushed) {
+val newLocalLimit =
+  
globalLimit.child.asInstanceOf[LocalLimit].withNewChildren(Seq(newChild))
+globalLimit.withNewChildren(Seq(newLocalLimit))
+  } else {
+newChild

Review Comment:
   I think there is a problem here. If `isPartiallyPushed` is false, it is 
assumed that `Limit` is completely pushed down so Spark doesn't do `Limit` any 
more. However, the `isPartiallyPushed`  false could come from the default case 
in `PushDownUtils`.`pushLimit`
   
   ```
 def pushLimit(scanBuilder: ScanBuilder, limit: Int): (Boolean, Boolean) = {
   scanBuilder match {
 case s: SupportsPushDownLimit if s.pushLimit(limit) =>
   (true, s.isPartiallyPushed)
 case _ => (false, false)
   }
 }
   ```
   In this case, the `Limit` at Spark is removed wrongly.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen closed pull request #166: Further expand and update the merge and commit process for committers

2019-01-10 Thread GitBox
srowen closed pull request #166: Further expand and update the merge and commit 
process for committers
URL: https://github.com/apache/spark-website/pull/166
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/committers.md b/committers.md
index 0eaad06e0..c3daf10fd 100644
--- a/committers.md
+++ b/committers.md
@@ -127,13 +127,41 @@ Git history for that code to see who reviewed patches 
before. You can do this us
 Changes pushed to the master branch on Apache cannot be removed; that is, we 
can't force-push to 
 it. So please don't add any test commits or anything like that, only real 
patches.
 
-All merges should be done using the 
-[dev/merge_spark_pr.py](https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py)
 
-script, which squashes the pull request's changes into one commit. To use this 
script, you 
+Setting up Remotes
+
+To use the `merge_spark_pr.py` script described below, you 
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
-as well as one called "apache-github" at `git://github.com/apache/spark`. For 
the `apache` repo, 
-you can authenticate using your ASF username and password. Ask 
`dev@spark.apache.org` if you have trouble with 
-this or want help doing your first merge.
+as well as one called `apache-github` at `git://github.com/apache/spark`.
+
+You will likely also have a remote `origin` pointing to your fork of Spark, and
+`upstream` pointing to the `apache/spark` GitHub repo. 
+
+If correct, your `git remote -v` should look like:
+
+```
+apache https://github.com/apache/spark.git (fetch)
+apache https://github.com/apache/spark.git (push)
+apache-github  git://github.com/apache/spark (fetch)
+apache-github  git://github.com/apache/spark (push)
+origin https://github.com/[your username]/spark.git (fetch)
+origin https://github.com/[your username]/spark.git (push)
+upstream   https://github.com/apache/spark.git (fetch)
+upstream   https://github.com/apache/spark.git (push)
+```
+
+For the `apache` repo, you will need to set up command-line authentication to 
GitHub. This may
+include setting up an SSH key and/or personal access token. See:
+
+- https://help.github.com/articles/connecting-to-github-with-ssh/
+- 
https://help.github.com/articles/creating-a-personal-access-token-for-the-command-line/
+
+Ask `dev@spark.apache.org` if you have trouble with these steps, or want help 
doing your first merge.
+
+Merge Script
+
+All merges should be done using the 
+[dev/merge_spark_pr.py](https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py),
+which squashes the pull request's changes into one commit.
 
 The script is fairly self explanatory and walks you through steps and options 
interactively.
 
@@ -144,29 +172,12 @@ Then, in a separate window, modify the code and push a 
commit. Run `git rebase -
 You can verify the result is one change with `git log`. Then resume the script 
in the other window.
 
 Also, please remember to set Assignee on JIRAs where applicable when they are 
resolved. The script 
-can't do this automatically.
-Once a PR is merged please leave a comment on the PR stating which branch(es) 
it has been merged with.
+can do this automatically in most cases. However where the contributor is not 
yet a part of the
+Contributors group for the Spark project in ASF JIRA, it won't work until they 
are added. Ask
+an admin to add the person to Contributors at 
+https://issues.apache.org/jira/plugins/servlet/project-config/SPARK/roles .
 
-
+Once a PR is merged please leave a comment on the PR stating which branch(es) 
it has been merged with.
 
 Policy on Backporting Bug Fixes
 
diff --git a/site/committers.html b/site/committers.html
index 3771ba93a..3b3f47112 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -532,13 +532,42 @@ How to Merge a Pull Request
 Changes pushed to the master branch on Apache cannot be removed; that is, 
we can’t force-push to 
 it. So please don’t add any test commits or anything like that, only 
real patches.
 
-All merges should be done using the 
-https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py";>dev/merge_spark_pr.py
 
-script, which squashes the pull request’s changes into one commit. To 
use this script, you 
+Setting up Remotes
+
+To use the merge_spark_pr.py script described below, you 
 will need to add a git remote called apache at 
https://github.com/apache/spark, 
-as well as one called “apache-github” at 
git://github.com/apache/spark. For the apache repo, 
-you can authenticate using your ASF username and password. Ask 
dev@spark.apache.org if you have trouble with 
-this or want help doing your first merge.
+as well as one called apache-github at 
git://github.com/apache/spark.
+
+You will likely also have

[GitHub] dongjoon-hyun commented on issue #166: Further expand and update the merge and commit process for committers

2019-01-10 Thread GitBox
dongjoon-hyun commented on issue #166: Further expand and update the merge and 
commit process for committers
URL: https://github.com/apache/spark-website/pull/166#issuecomment-453167761
 
 
   Great! Thank you for updating, @srowen .


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] dongjoon-hyun commented on a change in pull request #166: Further expand and update the merge and commit process for committers

2019-01-10 Thread GitBox
dongjoon-hyun commented on a change in pull request #166: Further expand and 
update the merge and commit process for committers
URL: https://github.com/apache/spark-website/pull/166#discussion_r246836939
 
 

 ##
 File path: committers.md
 ##
 @@ -127,13 +127,41 @@ Git history for that code to see who reviewed patches 
before. You can do this us
 Changes pushed to the master branch on Apache cannot be removed; that is, we 
can't force-push to 
 it. So please don't add any test commits or anything like that, only real 
patches.
 
-All merges should be done using the 
-[dev/merge_spark_pr.py](https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py)
 
-script, which squashes the pull request's changes into one commit. To use this 
script, you 
+Setting up Remotes
+
+To use the `merge_spark_pr.py` script described below, you 
 will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
-as well as one called "apache-github" at `git://github.com/apache/spark`. For 
the `apache` repo, 
-you can authenticate using your ASF username and password. Ask 
`dev@spark.apache.org` if you have trouble with 
-this or want help doing your first merge.
+as well as one called `apache-github` at `git://github.com/apache/spark`.
+
+You will likely also have a remote `origin` pointing to your fork of Spark, and
+`upstream` pointing to the `apache/spark` GitHub repo. 
+
+If correct, your `git remote -v` should look like:
+
+```
+apache https://github.com/apache/spark-website.git (fetch)
+apache https://github.com/apache/spark-website.git (push)
+apache-github  git://github.com/apache/spark-website (fetch)
+apache-github  git://github.com/apache/spark-website (push)
+origin https://github.com/[your username]/spark-website.git (fetch)
+origin https://github.com/[your username]/spark-website.git (push)
+upstream   https://github.com/apache/spark-website.git (fetch)
+upstream   https://github.com/apache/spark-website.git (push)
 
 Review comment:
   In this context, these should be `spark.git` instead of `spark-website.git`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen opened a new pull request #166: Further expand and update the merge and commit process for committers

2019-01-10 Thread GitBox
srowen opened a new pull request #166: Further expand and update the merge and 
commit process for committers
URL: https://github.com/apache/spark-website/pull/166
 
 
   Following up on 
https://github.com/apache/spark-website/commit/eb0aa14df472cff092b35ea1b894a0d880185561#r31886611
 with additional changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen closed pull request #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-10 Thread GitBox
srowen closed pull request #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/news/_posts/2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md 
b/news/_posts/2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
index b4c967c05..615a08cb4 100644
--- a/news/_posts/2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
+++ b/news/_posts/2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
@@ -1,6 +1,6 @@
 ---
 layout: post
-title: Spark+AI Summit (April 23-25th, 2018, San Francisco) agenda posted
+title: Spark+AI Summit (April 23-25th, 2019, San Francisco) agenda posted
 categories:
 - News
 tags: []
diff --git a/site/committers.html b/site/committers.html
index 3771ba93a..f41921844 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/community.html b/site/community.html
index 2548315ea..db666cec7 100644
--- a/site/community.html
+++ b/site/community.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/contributing.html b/site/contributing.html
index 018d1bb75..05cd6306a 100644
--- a/site/contributing.html
+++ b/site/contributing.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/developer-tools.html b/site/developer-tools.html
index b043c75c7..5fc463fcc 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/documentation.html b/site/documentation.html
index 4d54ac622..3ac0bb799 100644
--- a/site/documentation.html
+++ b/site/documentation.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/downloads.html b/site/downloads.html
index 68b9a048c..f9daa8010 100644
--- a/site/downloads.html
+++ b/site/downloads.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/examples.html b/site/examples.html
index ba71cdc7f..006c2cae2 100644
--- a/site/examples.html
+++ b/site/examples.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/faq.html b/site/faq.html
index 58c693b9a..0afcee05b 100644
--- a/site/faq.html
+++ b/site/faq.html
@@ -162,7 +162,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/graphx/index.html b/site/graphx/index.html
index 675b46511..2364dcf62 100644
--- a/site/graphx/index.html
+++ b/site/graphx/index.html
@@ -165,7 +165,7 @@
   Latest News
   
 
-  Spark+AI Summit (April 
23-25th, 2018, San Francisco) agenda posted
+  Spark+AI Summit (April 
23-25th, 2019, San Francisco) agenda posted
   (Dec 19, 2018)
 
   Spark 2.4.0 
released
diff --git a/site/history.html b/site/history.html
index b3b91c6ed..ff2f099df 100644
--- a/site/history.html
+++ b/site/history.html
@@ -162,7 +162,7 @@
   Latest News

[GitHub] jzhuge commented on issue #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-09 Thread GitBox
jzhuge commented on issue #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165#issuecomment-452937118
 
 
   Sure.
   
   On Wed, Jan 9, 2019 at 5:18 PM Sean Owen  wrote:
   
   > Oops, good catch @jzhuge  . Can you run jekyll
   > build locally to also update the HTML? if it's any trouble I can do it in
   > a separate PR.
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > ,
   > or mute the thread
   > 

   > .
   >
   
   
   -- 
   John Zhuge
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen commented on issue #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-09 Thread GitBox
srowen commented on issue #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165#issuecomment-452934048
 
 
   Oops, good catch @jzhuge . Can you run `jekyll build` locally to also update 
the HTML? if it's any trouble I can do it in a separate PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] jzhuge opened a new pull request #165: Update 2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md

2019-01-09 Thread GitBox
jzhuge opened a new pull request #165: Update 
2018-12-19-spark-ai-summit-apr-2019-agenda-posted.md
URL: https://github.com/apache/spark-website/pull/165
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen closed pull request #164: Suggest new Apache repo for committers

2019-01-07 Thread GitBox
srowen closed pull request #164: Suggest new Apache repo for committers
URL: https://github.com/apache/spark-website/pull/164
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/committers.md b/committers.md
index 0824b26cb..0eaad06e0 100644
--- a/committers.md
+++ b/committers.md
@@ -130,9 +130,9 @@ it. So please don't add any test commits or anything like 
that, only real patche
 All merges should be done using the 
 
[dev/merge_spark_pr.py](https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py)
 
 script, which squashes the pull request's changes into one commit. To use this 
script, you 
-will need to add a git remote called "apache" at 
https://git-wip-us.apache.org/repos/asf/spark.git, 
+will need to add a git remote called `apache` at 
`https://github.com/apache/spark`, 
 as well as one called "apache-github" at `git://github.com/apache/spark`. For 
the `apache` repo, 
-you can authenticate using your ASF username and password. Ask Patrick if you 
have trouble with 
+you can authenticate using your ASF username and password. Ask 
`dev@spark.apache.org` if you have trouble with 
 this or want help doing your first merge.
 
 The script is fairly self explanatory and walks you through steps and options 
interactively.
diff --git a/site/committers.html b/site/committers.html
index 5843c0da8..3771ba93a 100644
--- a/site/committers.html
+++ b/site/committers.html
@@ -535,9 +535,9 @@ How to Merge a Pull Request
 All merges should be done using the 
 https://github.com/apache/spark/blob/master/dev/merge_spark_pr.py";>dev/merge_spark_pr.py
 
 script, which squashes the pull request’s changes into one commit. To 
use this script, you 
-will need to add a git remote called “apache” at 
https://git-wip-us.apache.org/repos/asf/spark.git, 
+will need to add a git remote called apache at 
https://github.com/apache/spark, 
 as well as one called “apache-github” at 
git://github.com/apache/spark. For the apache repo, 
-you can authenticate using your ASF username and password. Ask Patrick if you 
have trouble with 
+you can authenticate using your ASF username and password. Ask 
dev@spark.apache.org if you have trouble with 
 this or want help doing your first merge.
 
 The script is fairly self explanatory and walks you through steps and 
options interactively.


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] rxin commented on issue #164: Suggest new Apache repo for committers

2019-01-07 Thread GitBox
rxin commented on issue #164: Suggest new Apache repo for committers
URL: https://github.com/apache/spark-website/pull/164#issuecomment-452047166
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen opened a new pull request #164: Suggest new Apache repo for committers

2019-01-07 Thread GitBox
srowen opened a new pull request #164: Suggest new Apache repo for committers
URL: https://github.com/apache/spark-website/pull/164
 
 
   This suggests to committers that they should use the new github remote to 
push to Apache.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] srowen commented on a change in pull request #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-20 Thread GitBox
srowen commented on a change in pull request #163: Announce the schedule of 
2019 Spark+AI summit at SF
URL: https://github.com/apache/spark-website/pull/163#discussion_r243353800
 
 

 ##
 File path: site/sitemap.xml
 ##
 @@ -139,657 +139,661 @@
 
 
 
-  https://spark.apache.org/releases/spark-release-2-4-0.html
+  
http://localhost:4000/news/spark-ai-summit-apr-2019-agenda-posted.html
 
 Review comment:
   I just pushed a fix directly, it's done


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-19 Thread GitBox
gatorsmile commented on a change in pull request #163: Announce the schedule of 
2019 Spark+AI summit at SF
URL: https://github.com/apache/spark-website/pull/163#discussion_r243158425
 
 

 ##
 File path: site/sitemap.xml
 ##
 @@ -139,657 +139,661 @@
 
 
 
-  https://spark.apache.org/releases/spark-release-2-4-0.html
+  
http://localhost:4000/news/spark-ai-summit-apr-2019-agenda-posted.html
 
 Review comment:
   Will update them later. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] ueshin commented on a change in pull request #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-19 Thread GitBox
ueshin commented on a change in pull request #163: Announce the schedule of 
2019 Spark+AI summit at SF
URL: https://github.com/apache/spark-website/pull/163#discussion_r243132369
 
 

 ##
 File path: site/sitemap.xml
 ##
 @@ -139,657 +139,661 @@
 
 
 
-  https://spark.apache.org/releases/spark-release-2-4-0.html
+  
http://localhost:4000/news/spark-ai-summit-apr-2019-agenda-posted.html
 
 Review comment:
   @gatorsmile oops, I mean, there are a lot of `localhost:4000` in this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] gatorsmile commented on issue #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-19 Thread GitBox
gatorsmile commented on issue #163: Announce the schedule of 2019 Spark+AI 
summit at SF
URL: https://github.com/apache/spark-website/pull/163#issuecomment-448825575
 
 
   Thanks! Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] ueshin commented on a change in pull request #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-19 Thread GitBox
ueshin commented on a change in pull request #163: Announce the schedule of 
2019 Spark+AI summit at SF
URL: https://github.com/apache/spark-website/pull/163#discussion_r243130975
 
 

 ##
 File path: site/sitemap.xml
 ##
 @@ -139,657 +139,661 @@
 
 
 
-  https://spark.apache.org/releases/spark-release-2-4-0.html
+  
http://localhost:4000/news/spark-ai-summit-apr-2019-agenda-posted.html
 
 Review comment:
   Still remaining `localhost:4000` in this file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] gatorsmile commented on a change in pull request #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-19 Thread GitBox
gatorsmile commented on a change in pull request #163: Announce the schedule of 
2019 Spark+AI summit at SF
URL: https://github.com/apache/spark-website/pull/163#discussion_r243128948
 
 

 ##
 File path: site/mailing-lists.html
 ##
 @@ -12,7 +12,7 @@
 
   
 
-https://spark.apache.org/community.html"; />
+http://localhost:4000/community.html"; />
 
 Review comment:
   Need to fix this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] gatorsmile commented on issue #163: Announce the schedule of 2019 Spark+AI summit at SF

2018-12-19 Thread GitBox
gatorsmile commented on issue #163: Announce the schedule of 2019 Spark+AI 
summit at SF
URL: https://github.com/apache/spark-website/pull/163#issuecomment-448815820
 
 
   cc @rxin @yhuai @cloud-fan @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] gatorsmile opened a new pull request #163: Announce the schedule of Spark+AI summit at SF 2019

2018-12-19 Thread GitBox
gatorsmile opened a new pull request #163: Announce the schedule of Spark+AI 
summit at SF 2019
URL: https://github.com/apache/spark-website/pull/163
 
 
   ![screen shot 2018-12-19 at 4 59 12 
pm](https://user-images.githubusercontent.com/11567269/50257364-d76e4900-03af-11e9-9690-3de0a87917ef.png)
   ![screen shot 2018-12-19 at 4 59 02 
pm](https://user-images.githubusercontent.com/11567269/50257365-d806df80-03af-11e9-9dff-fabc08bb64b5.png)
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] HyukjinKwon closed pull request #162: Add a note about Spark build requirement at PySpark testing guide in Developer Tools

2018-12-18 Thread GitBox
HyukjinKwon closed pull request #162: Add a note about Spark build requirement 
at PySpark testing guide in Developer Tools
URL: https://github.com/apache/spark-website/pull/162
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/developer-tools.md b/developer-tools.md
index ebe6905fa..43ad445d6 100644
--- a/developer-tools.md
+++ b/developer-tools.md
@@ -131,6 +131,8 @@ build/mvn test -DwildcardSuites=none 
-Dtest=org.apache.spark.streaming.JavaAPISu
 Testing PySpark
 
 To run individual PySpark tests, you can use `run-tests` script under `python` 
directory. Test cases are located at `tests` package under each PySpark 
packages.
+Note that, if you add some changes into Scala or Python side in Apache Spark, 
you need to manually build Apache Spark again before running PySpark tests in 
order to apply the changes.
+Running PySpark testing script does not automatically build it.
 
 To run test cases in a specific module:
 
diff --git a/site/developer-tools.html b/site/developer-tools.html
index 82dab671a..710f6f53e 100644
--- a/site/developer-tools.html
+++ b/site/developer-tools.html
@@ -313,7 +313,9 @@ Testing with Maven
 
 Testing PySpark
 
-To run individual PySpark tests, you can use run-tests script 
under python directory. Test cases are located at 
tests package under each PySpark packages.
+To run individual PySpark tests, you can use run-tests script 
under python directory. Test cases are located at 
tests package under each PySpark packages.
+Note that, if you add some changes into Scala or Python side in Apache Spark, 
you need to manually build Apache Spark again before running PySpark tests in 
order to apply the changes.
+Running PySpark testing script does not automatically build it.
 
 To run test cases in a specific module:
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] HyukjinKwon commented on issue #162: Add a note about Spark build requirement at PySpark testing guide in Developer Tools

2018-12-18 Thread GitBox
HyukjinKwon commented on issue #162: Add a note about Spark build requirement 
at PySpark testing guide in Developer Tools
URL: https://github.com/apache/spark-website/pull/162#issuecomment-448164740
 
 
   Thanks guys!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] HyukjinKwon commented on issue #162: Add a note about Spark build requirement at PySpark testing guide in Developer Tools

2018-12-17 Thread GitBox
HyukjinKwon commented on issue #162: Add a note about Spark build requirement 
at PySpark testing guide in Developer Tools
URL: https://github.com/apache/spark-website/pull/162#issuecomment-448075651
 
 
   adding @squito as well FYI


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] HyukjinKwon opened a new pull request #162: Add a note about Spark build requirement at PySpark testing guide in Developer Tools

2018-12-17 Thread GitBox
HyukjinKwon opened a new pull request #162: Add a note about Spark build 
requirement at PySpark testing guide in Developer Tools
URL: https://github.com/apache/spark-website/pull/162
 
 
   I received some feedback about running PySpark tests via private emails. 
Unlike SBT or Maven testing, PySpark testing script requires to build Apache 
Spark manually.
   I also realised that it might be confusing when we think about SBT, Maven 
and PySpark testing.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[GitHub] HyukjinKwon commented on issue #162: Add a note about Spark build requirement at PySpark testing guide in Developer Tools

2018-12-17 Thread GitBox
HyukjinKwon commented on issue #162: Add a note about Spark build requirement 
at PySpark testing guide in Developer Tools
URL: https://github.com/apache/spark-website/pull/162#issuecomment-448075198
 
 
   adding @cloud-fan and @srowen.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org