date:20191009

[GitHub] [spark] wangyum commented on issue #25018: [SPARK-26321][SQL] Port HIVE-15297: Hive should not split semicolon within quoted string literals

2019-10-09 Thread GitBox

wangyum commented on issue #25018: [SPARK-26321][SQL] Port HIVE-15297: Hive 
should not split semicolon within quoted string literals
URL: https://github.com/apache/spark/pull/25018#issuecomment-540369275
 
 
   Thank you @HyukjinKwon I added another test case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-10-09 Thread GitBox

HyukjinKwon removed a comment on issue #25820: [SPARK-29101][SQL] Fix count API 
for csv file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-540358900
 
 
   This issue exists in Spark 2.4 as well. Let me open a backporting PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv file when DROPMALFORMED mode is selected

2019-10-09 Thread GitBox

HyukjinKwon commented on issue #25820: [SPARK-29101][SQL] Fix count API for csv 
file when DROPMALFORMED mode is selected
URL: https://github.com/apache/spark/pull/25820#issuecomment-540358900
 
 
   This issue exists in Spark 2.4 as well. Let me open a backporting PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] shivusondur commented on a change in pull request #25561: [SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference.

2019-10-09 Thread GitBox

shivusondur commented on a change in pull request #25561: 
[SPARK-28810][DOC][SQL] Document SHOW TABLES in SQL Reference.
URL: https://github.com/apache/spark/pull/25561#discussion_r22463
 
 

 ##
 File path: docs/sql-ref-syntax-aux-show-tables.md
 ##
 @@ -38,10 +38,13 @@ SHOW TABLES [{FROM|IN} database_name] [LIKE 
'regex_pattern']
   
   LIKE 'regex_pattern'
   
- Specifies the regex pattern that is used to filter out unwanted tables.
-- The pattern is a regex except `*` and `|`characters
-- `*` matches 0 or more characters and `|` used to provide more 
than one regex with OR condition
-- The leading and trailing blanks are trimmed in the input pattern 
before processing.  
+ Specifies the regular expression pattern that is used to filter out 
unwanted tables. 
+  
+   Except `*` and `|` characters remaining characters will follow 
the regular expression convention.
+   `*` matches 0 or more characters and `|` used to provide more 
than one regex with OR condition. 
 
 Review comment:
   comments handled
   Thanks for reviewing


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on issue #26068: [SPARK-29405][SQL] Alter table / Insert statements should not change a table's ownership

2019-10-09 Thread GitBox

yaooqinn commented on issue #26068: [SPARK-29405][SQL] Alter table / Insert 
statements should not change a table's ownership
URL: https://github.com/apache/spark/pull/26068#issuecomment-540330063
 
 
   test this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on issue #25648: [SPARK-28947][K8S] Status logging not happens at an interval for liveness

2019-10-09 Thread GitBox

yaooqinn commented on issue #25648: [SPARK-28947][K8S] Status logging not 
happens at an interval for liveness
URL: https://github.com/apache/spark/pull/25648#issuecomment-540321431
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum commented on a change in pull request #26014: [SPARK-29349][SQL] Support FETCH_PRIOR in Thriftserver fetch request

2019-10-09 Thread GitBox

wangyum commented on a change in pull request #26014: [SPARK-29349][SQL] 
Support FETCH_PRIOR in Thriftserver fetch request
URL: https://github.com/apache/spark/pull/26014#discussion_r10328
 
 

 ##
 File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 ##
 @@ -684,6 +685,92 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   
assert(e.getMessage.contains("org.apache.spark.sql.catalyst.parser.ParseException"))
 }
   }
+
+  test("ThriftCLIService FetchResults FETCH_FIRST, FETCH_NEXT, FETCH_PRIOR") {
+def checkResult(rows: RowSet, start: Long, end: Long): Unit = {
+  assert(rows.getStartOffset() == start)
+  assert(rows.numRows() == end - start)
+  rows.iterator.asScala.zip((start until end).iterator).foreach { case 
(row, v) =>
+assert(row(0).asInstanceOf[Long] === v)
+  }
+}
+
+withCLIServiceClient { client =>
+  val user = System.getProperty("user.name")
+  val sessionHandle = client.openSession(user, "")
+
+  val confOverlay = new java.util.HashMap[java.lang.String, 
java.lang.String]
+  val operationHandle = client.executeStatement(
+sessionHandle,
+"SELECT * FROM range(10)",
+confOverlay) // 10 rows result with sequence 0, 1, 2, ..., 9
+  var rows: RowSet = null
+
+  // Fetch 5 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 5, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 0, 5) // fetched [0, 5)
+
+  // Fetch another 2 rows with FETCH_NEXT
+  rows = client.fetchResults(
+operationHandle, FetchOrientation.FETCH_NEXT, 2, 
FetchType.QUERY_OUTPUT)
+  checkResult(rows, 5, 7) // fetched [5, 7)
+
+  // FETCH_PRIOR 3 rows
 
 Review comment:
   The `startOffset` is `5` at this time. If we do FETCH_PRIOR one row, it is 
4. But PostgreSQL returns `5`. Is this what we expect?
   
   ```sql
   postgres=# begin;
   BEGIN
   postgres=# declare rf_cur scroll cursor for select * from generate_series(0, 
9);
   DECLARE CURSOR
   postgres=# FETCH 5 from rf_cur;
generate_series
   -
  0
  1
  2
  3
  4
   (5 rows)
   
   postgres=# FETCH 2 from rf_cur;
generate_series
   -
  5
  6
   (2 rows)
   
   postgres=# FETCH PRIOR from rf_cur;
generate_series
   -
  5
   (1 row)
   
   postgres=# commit;
   COMMIT
   ```
   
   https://www.postgresql.org/docs/11/sql-fetch.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun edited a comment on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540297020
 
 
   Thanks, @nrchakradhar . For me, this seems to be dumped from that repo by 
the author.
   
   I took a look. The current status is difficult to be considered for Apache 
project contribution because there is no Apache license header and the 
remaining package name authorship `radanalytics`.
   
   I'd like to recommend @jkremser to reopen this after cleaning up the PR by 
adding licenses and remove the authorship package name. I'm close this for now.
   
   You can use `./dev/check-license` script to check the missing licenses.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540300821
 
 
   Definitely, +1 for @holdenk 's comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

holdenk commented on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540299867
 
 
   I think before putting in work on this PR getting buy-in from the `dev@` 
list might be a good next first step.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26067: [SPARK-28989][SQL][FollowUp] Update ANSI mode related config names in comments

2019-10-09 Thread GitBox

dongjoon-hyun closed pull request #26067: [SPARK-28989][SQL][FollowUp] Update 
ANSI mode related config names in comments
URL: https://github.com/apache/spark/pull/26067
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] Support ANSI SQL: result offset clause in query expression

2019-10-09 Thread GitBox

beliefer commented on a change in pull request #25416: [SPARK-28330][SQL] 
Support ANSI SQL: result offset clause in query expression
URL: https://github.com/apache/spark/pull/25416#discussion_r332897164
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -399,6 +399,7 @@ queryOrganization
   (SORT BY sort+=sortItem (',' sort+=sortItem)*)?
   windowClause?
   (LIMIT (ALL | limit=expression))?
+  (OFFSET offset=expression)?
 
 Review comment:
   Yes, SQL standard defined the order for `LIMIT` and `OFFSET` clauses. 
`LIMIT` at the front of `OFFSET`.
   The behavior looks contrary to MySQL direction. 
   `MySQL` support `LIMIT 10,10`. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun edited a comment on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540297020
 
 
   Thanks, @nrchakradhar . For me, this seems to dumped from that repo by the 
author.
   
   I took a look. The current status is difficult to be considered for Apache 
project contribution because there is no Apache license header and the 
remaining package name authorship `radanalytics`.
   
   I'd like to recommend @jkremser to reopen this after cleaning up the PR by 
adding licenses and remove the authorship package name. I'm close this for now.
   
   You can use `./dev/check-license` script to check the missing licenses.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun closed pull request #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540297020
 
 
   Thanks, @nrchakradhar . For me, this seems to dumped from that repo by the 
author.
   
   I took a look. The current status is difficult to be considered for Apache 
project contribution because there is no Apache license header and the 
remaining package name authorship `radanalytics`.
   
   I'd like to recommend @jkremser to reopen this after cleaning up the PR by 
adding licenses and remove the authorship package name. I'm close this for now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #26071: [SPARK-29412][SQL] refine the document of v2 session catalog config

2019-10-09 Thread GitBox

cloud-fan commented on a change in pull request #26071: [SPARK-29412][SQL] 
refine the document of v2 session catalog config
URL: https://github.com/apache/spark/pull/26071#discussion_r02564
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1976,11 +1977,19 @@ object SQLConf {
 .stringConf
 .createOptional
 
-  val V2_SESSION_CATALOG = buildConf("spark.sql.catalog.session")
-  .doc("A catalog implementation that will be used in place of the Spark 
built-in session " +
-"catalog for v2 operations. The implementation may extend 
`CatalogExtension` to be " +
-"passed the Spark built-in session catalog, so that it may delegate 
calls to the " +
-"built-in session catalog.")
+  val V2_SESSION_CATALOG_IMPLEMENTATION =
+buildConf(s"spark.sql.catalog.${CatalogManager.SESSION_CATALOG_NAME}")
+  .doc("A catalog implementation that will be used in place of the Spark 
Catalog for v2 " +
+"operations (e.g. create table using a v2 source, alter a v2 table). 
The Spark Catalog " +
+"is the current catalog by default, and supports all kinds of catalog 
operations like " +
+"CREATE TABLE USING v1/v2 source, VIEW/FUNCTION related operations, 
etc. This config is " +
+"used to extend the Spark Catalog and inject custom logic to v2 
operations, while other" +
+"operations still go through the Spark Catalog. The catalog 
implementation specified " +
+"by this config should extend `CatalogExtension` to be passed the 
Spark Catalog, " +
+"so that it can delegate calls to Spark Catalog. Otherwise, the 
implementation " +
+"should figure out a way to access the Spark Catalog or its underlying 
meta-store " +
+"by itself. It's important to make the implementation share the 
underlying meta-store " +
+"of the Spark Catalog and act as an extension, instead of a separated 
catalog.")
 
 Review comment:
   thanks for the suggestion!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liucht-inspur commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-09 Thread GitBox

liucht-inspur commented on issue #25994: [SPARK-29323][WEBUI] Add tooltip for 
The Executors Tab's column names in the Spark history server Page
URL: https://github.com/apache/spark/pull/25994#issuecomment-540284189
 
 
   @srowen Thanks again, I have finished the change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #25018: [SPARK-26321][SQL] Port HIVE-15297: Hive should not split semicolon within quoted string literals

2019-10-09 Thread GitBox

HyukjinKwon commented on issue #25018: [SPARK-26321][SQL] Port HIVE-15297: Hive 
should not split semicolon within quoted string literals
URL: https://github.com/apache/spark/pull/25018#issuecomment-540277064
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] liucht-inspur commented on a change in pull request #25994: [SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the Spark history server Page

2019-10-09 Thread GitBox

liucht-inspur commented on a change in pull request #25994: 
[SPARK-29323][WEBUI] Add tooltip for The Executors Tab's column names in the 
Spark history server Page
URL: https://github.com/apache/spark/pull/25994#discussion_r333292223
 
 

 ##
 File path: 
core/src/main/resources/org/apache/spark/ui/static/executorspage-template.html
 ##
 @@ -37,8 +37,8 @@ Summary
 
 Disk Used
 Cores
-Active Tasks
-Failed Tasks
+Active 
Tasks
+Failed Tasks
 
 Review comment:
   Yes, they are more suitable，thank you！


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'

2019-10-09 Thread GitBox

HyukjinKwon commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS  show 
'!=', '<>' , 'between', 'case'
URL: https://github.com/apache/spark/pull/26053#issuecomment-540272697
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

HyukjinKwon commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-540272481
 
 
   Seems fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] nrchakradhar commented on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

nrchakradhar commented on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540270821
 
 
   This is same as https://github.com/radanalyticsio/spark-operator.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] Add JSON serialization for external Rows

2019-10-09 Thread GitBox

HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] 
Add JSON serialization for external Rows
URL: https://github.com/apache/spark/pull/26013#discussion_r333285090
 
 

 ##
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
 ##
 @@ -501,4 +513,88 @@ trait Row extends Serializable {
   private def getAnyValAs[T <: AnyVal](i: Int): T =
 if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
+
+  /** The compact JSON representation of this row. */
+  def json: String = compact(jsonValue)
 
 Review comment:
   There's one API case we dropped performance improvement in `Row` as an 
example (see https://github.com/apache/spark/pull/23271).
   
   ```scala
 @deprecated("This method is deprecated and will be removed in future 
versions.", "3.0.0")
 def merge(rows: Row*): Row = {
   // TODO: Improve the performance of this if used in performance critical 
part.
   new GenericRow(rows.flatMap(_.toSeq).toArray)
 }
   ```
   
   Do you mind if I ask to add `@Unstable` or `@Private` for these new APIs 
instead just for future improvement in case, with `@since` tag?
   
   `Row` itself is marked as `@Stable` so it might better explicitly note that 
this can be changed in the future. With this LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] Add JSON serialization for external Rows

2019-10-09 Thread GitBox

HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] 
Add JSON serialization for external Rows
URL: https://github.com/apache/spark/pull/26013#discussion_r333285090
 
 

 ##
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
 ##
 @@ -501,4 +513,88 @@ trait Row extends Serializable {
   private def getAnyValAs[T <: AnyVal](i: Int): T =
 if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
+
+  /** The compact JSON representation of this row. */
+  def json: String = compact(jsonValue)
 
 Review comment:
   There's one API case we dropped performance improvement in `Row` as an 
example (see https://github.com/apache/spark/pull/23271).
   
   ```scala
 @deprecated("This method is deprecated and will be removed in future 
versions.", "3.0.0")
 def merge(rows: Row*): Row = {
   // TODO: Improve the performance of this if used in performance critical 
part.
   new GenericRow(rows.flatMap(_.toSeq).toArray)
 }
   ```
   
   Do you mind if I ask to add `@Unstable` or `@Private` for these new APIs 
instead just for future improvement in case, with `@since` in the Scaladoc?
   
   `Row` itself is marked as `@Stable` so it might better explicitly note that 
this can be changed in the future. With this LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] Add JSON serialization for external Rows

2019-10-09 Thread GitBox

HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] 
Add JSON serialization for external Rows
URL: https://github.com/apache/spark/pull/26013#discussion_r333285090
 
 

 ##
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
 ##
 @@ -501,4 +513,88 @@ trait Row extends Serializable {
   private def getAnyValAs[T <: AnyVal](i: Int): T =
 if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
+
+  /** The compact JSON representation of this row. */
+  def json: String = compact(jsonValue)
 
 Review comment:
   There's one API case we dropped performance improvement in `Row` as an 
example.
   
   ```scala
 @deprecated("This method is deprecated and will be removed in future 
versions.", "3.0.0")
 def merge(rows: Row*): Row = {
   // TODO: Improve the performance of this if used in performance critical 
part.
   new GenericRow(rows.flatMap(_.toSeq).toArray)
 }
   ```
   
   Do you mind if I ask to add `@Unstable` or `@Private` for these new APIs 
instead just for future improvement in case, with `@since` tag?
   
   `Row` itself is marked as `@Stable` so it might better explicitly note that 
this can be changed in the future. With this LGTM.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] Add JSON serialization for external Rows

2019-10-09 Thread GitBox

HyukjinKwon commented on a change in pull request #26013: [SPARK-29347][SQL] 
Add JSON serialization for external Rows
URL: https://github.com/apache/spark/pull/26013#discussion_r333285090
 
 

 ##
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
 ##
 @@ -501,4 +513,88 @@ trait Row extends Serializable {
   private def getAnyValAs[T <: AnyVal](i: Int): T =
 if (isNullAt(i)) throw new NullPointerException(s"Value at index $i is 
null")
 else getAs[T](i)
+
+  /** The compact JSON representation of this row. */
+  def json: String = compact(jsonValue)
 
 Review comment:
   There's one API case we dropped performance improvement in `Row` as an 
example.
   
   ```scala
 @deprecated("This method is deprecated and will be removed in future 
versions.", "3.0.0")
 def merge(rows: Row*): Row = {
   // TODO: Improve the performance of this if used in performance critical 
part.
   new GenericRow(rows.flatMap(_.toSeq).toArray)
 }
   ```
   
   Do you mind if I ask to add `@Unstable` or `@Private` for these new APIs 
instead just for future improvement in case, with `@since` tag?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety bug in createDataset(Seq)

2019-10-09 Thread GitBox

cloud-fan commented on issue #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety bug in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540259061
 
 
   > reducing the cost of .copy() by refactoring the ExpressionEncoder class
   
   That sounds like a good idea to me. Can we do that first?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333283788
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
 
 Review comment:
   For that `Prometheus` question, different labels mean different time-series 
in `Prometheus`.
   > Prometheus fundamentally stores all data as time series: streams of 
timestamped values belonging to the same metric and the same set of labeled 
dimensions.
   
   Here are the reference for the details~
   - https://prometheus.io/docs/concepts/data_model/
   - https://prometheus.io/docs/practices/naming/#labels


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #19424: [SPARK-22197][SQL] push down operators to data source before planning

2019-10-09 Thread GitBox

cloud-fan commented on issue #19424: [SPARK-22197][SQL] push down operators to 
data source before planning
URL: https://github.com/apache/spark/pull/19424#issuecomment-540257605
 
 
   @rafaelkyrdan limit pushdown has not been supported yet.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-09 Thread GitBox

HyukjinKwon closed pull request #26041: [SPARK-29403][INFRA][R] Uses Arrow R 
0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-09 Thread GitBox

HyukjinKwon commented on issue #26041: [SPARK-29403][INFRA][R] Uses Arrow R 
0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#issuecomment-540256081
 
 
   I am merging this to keep our AppVeyor build alive for now ..


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26073: [SPARK-29416][CORE][ML][SQL][MESOS][TESTS] Use .sameElements to compare arrays, instead of .deep (gone in 2.13)

2019-10-09 Thread GitBox

dongjoon-hyun closed pull request #26073: 
[SPARK-29416][CORE][ML][SQL][MESOS][TESTS] Use .sameElements to compare arrays, 
instead of .deep (gone in 2.13)
URL: https://github.com/apache/spark/pull/26073
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-09 Thread GitBox

HyukjinKwon commented on a change in pull request #26041: 
[SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#discussion_r333282310
 
 

 ##
 File path: appveyor.yml
 ##
 @@ -57,6 +57,8 @@ build_script:
 
 environment:
   NOT_CRAN: true
+  # See SPARK-29378.
+  ARROW_PRE_0_15_IPC_FORMAT: 1
 
 Review comment:
   I removed it in the latest commit :-).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

viirya commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333281688
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
 
 Review comment:
   I may misunderstand Prometheus's approach. If so, then this might not be a 
problem.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

viirya commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333281562
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
 
 Review comment:
   > No. Prometheus query language support to handle them individually.
   
   Yes. But I am wondering is, now all numbers from all applications are 
recorded under same metric. To retrieve number for specified application, does 
not Prometheus need to search it among all applications' metric numbers?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #25984: [SPARK-29308][BUILD] Update deps in dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #25984: [SPARK-29308][BUILD] Update deps in 
dev/deps/spark-deps-hadoop-3.2 for hadoop-3.2
URL: https://github.com/apache/spark/pull/25984#issuecomment-540251763
 
 
   Unfortunately, Jenkins is down. Let's wait until Jenkins is back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph API

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #24851: [SPARK-27303][GRAPH] Add Spark Graph 
API
URL: https://github.com/apache/spark/pull/24851#issuecomment-540247560
 
 
   Thank you for updating, @s1ck . I'll take a look tonight~


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26077: [SPARK-29413][CORE][FOLLOWUP] A better parmap method signature

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26077: 
[SPARK-29413][CORE][FOLLOWUP] A better parmap method signature
URL: https://github.com/apache/spark/pull/26077#discussion_r333274284
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -274,12 +274,19 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
+  def parmap[I, O, Col[X] <: TraversableOnce[X]]
 
 Review comment:
   Thank you for the investigation and the conclusion, @zsxwing and @srowen .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333270955
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
 
 Review comment:
   Right. The redundant information is moved to labels.
   > Actually prefix is fixed? So they are now the same metrics. And 
application id, executor id now are labels on them?
   
   No. Prometheus query language support to handle them individually.
   > Does it have bad impact on the metrics usage later? Because now all 
applications are recorded under the same metrics.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333270410
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
 
 Review comment:
   This is the only field *human-readable* to distinguish the jobs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on issue #26077: [SPARK-29413][CORE][FOLLOWUP] A better parmap method signature

2019-10-09 Thread GitBox

zsxwing commented on issue #26077: [SPARK-29413][CORE][FOLLOWUP] A better 
parmap method signature
URL: https://github.com/apache/spark/pull/26077#issuecomment-540236633
 
 
   Closing my PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing closed pull request #26077: [SPARK-29413][CORE][FOLLOWUP] A better parmap method signature

2019-10-09 Thread GitBox

zsxwing closed pull request #26077: [SPARK-29413][CORE][FOLLOWUP] A better 
parmap method signature
URL: https://github.com/apache/spark/pull/26077
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #26077: [SPARK-29413][CORE][FOLLOWUP] A better parmap method signature

2019-10-09 Thread GitBox

zsxwing commented on a change in pull request #26077: 
[SPARK-29413][CORE][FOLLOWUP] A better parmap method signature
URL: https://github.com/apache/spark/pull/26077#discussion_r333264618
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -274,12 +274,19 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
+  def parmap[I, O, Col[X] <: TraversableOnce[X]]
 
 Review comment:
   Hm, `CanBuildFrom` doesn't exist in 2.13 either. Then we have to write two 
versions for 2.12 and 2.13. Hence, I'm okey to just use `Seq` instead. Adding 
`asInstanceOf` if necessary is better than two versions of codes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] PingHao commented on issue #24922: [SPARK-28120][SS] Rocksdb state storage implementation

2019-10-09 Thread GitBox

PingHao commented on issue #24922: [SPARK-28120][SS]  Rocksdb state storage 
implementation
URL: https://github.com/apache/spark/pull/24922#issuecomment-540235341
 
 
   > 
   > 
   > @PingHao - Both the issue- stuck executor, core dump - might be due to the 
same reasons. Will debug and fix it.
   > 
   > Would you have some code snippets that can help me to reproduce the 
problem?
   My running code is difficult to isolation or share here, so here is a new 
test case  (based on existing test case "maintenance" ) in your 
RocksDbStateStoreSuite.scala, to try simulator parallel spark tasks operation 
on each partition statestore and at same time have maintenance thread try 
destoryDB.  see code here 
   
   
https://gist.github.com/PingHao/c20846542adda742f27ff00459fafe29#file-rocksdbstatestoresuite-scala-L384
   
   I can produce core dump on my developer machine, but not sure if logic is 
legit anyway. 
   you can change N - number of partitions, and LOOPS. recommend N = number of 
cpu cores.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-09 Thread GitBox

HeartSaVioR commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333262965
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
 }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+if (!live) {
+  kvstore.view(classOf[JobDataWrapper])
+.asScala.filter(_.info.status == JobExecutionStatus.RUNNING)
+.map(_.toLiveJob).foreach(job => liveJobs.put(job.jobId, job))
+
+  kvstore.view(classOf[StageDataWrapper]).asScala
+.filter { stageData =>
+  stageData.info.status == v1.StageStatus.PENDING ||
+stageData.info.status == v1.StageStatus.ACTIVE
+}
+.map { stageData =>
+  val stageId = stageData.info.stageId
+  val jobs = liveJobs.values.filter(_.stageIds.contains(stageId)).toSeq
+  stageData.toLiveStage(jobs)
+}.foreach { stage =>
+val stageId = stage.info.stageId
+val stageAttempt = stage.info.attemptNumber()
+liveStages.put((stageId, stageAttempt), stage)
 
 Review comment:
   That would be nice, it would beyond the scope of this PR though.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-09 Thread GitBox

HeartSaVioR commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333262518
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
 }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+if (!live) {
+  kvstore.view(classOf[JobDataWrapper])
+.asScala.filter(_.info.status == JobExecutionStatus.RUNNING)
+.map(_.toLiveJob).foreach(job => liveJobs.put(job.jobId, job))
+
+  kvstore.view(classOf[StageDataWrapper]).asScala
+.filter { stageData =>
+  stageData.info.status == v1.StageStatus.PENDING ||
+stageData.info.status == v1.StageStatus.ACTIVE
+}
+.map { stageData =>
 
 Review comment:
   Yes, see foreach in below.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-09 Thread GitBox

HeartSaVioR commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333262295
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala
 ##
 @@ -103,6 +104,81 @@ private[spark] class AppStatusListener(
 }
   }
 
+  // visible for tests
+  private[spark] def recoverLiveEntities(): Unit = {
+if (!live) {
 
 Review comment:
   > I think our current usage of a live(true) AppStatusListener guarantees the 
empty KVStore at the initialization step. So I don't understand well for 
depends on the possibility ?
   
   That's a "context" we might have a chance to break eventually and as a 
side-effect it will break here. I'm in favor of doing defensive programming: if 
there're preconditions it should be mentioned anywhere or asserted. But I agree 
we don't have isEmpty api for KVStore - let's leave it as it is.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

viirya commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333261862
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
 
 Review comment:
   Actually `prefix` is fixed? So they are now the same metrics. And 
application id, executor id now are labels on them?
   
   Does it have bad impact on the metrics usage later? Because now all 
applications are recorded under the same metrics. I am not sure how Prometheus 
processes, but naturally I'd think Prometheus needs to search specified 
application id in the metrics of all applications.
   
   Previously you have appId and executor id in metric name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

viirya commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333261862
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
+"executor_id" -> executor.id
+  ).map { case (k, v) => s"""$k="$v }.mkString("{", ", ", "}")
+  sb.append(s"${prefix}rddBlocks_Count$labels ${executor.rddBlocks}\n")
 
 Review comment:
   Actually `prefix` is fixed? So they are now the same metrics. And 
application id, executor id now are labels on them?
   
   Does it have bad impact on the metrics usage later? Because now all 
applications are recorded under the same metrics. I am not sure how Prometheus 
processes, but naturally I'd think Prometheus needs to search specified 
application in the metrics of all applications.
   
   Previously you have appId and executor id in metric name.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

viirya commented on a change in pull request #26060: [SPARK-29400][CORE] 
Improve PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#discussion_r333259908
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/status/api/v1/PrometheusResource.scala
 ##
 @@ -40,30 +40,35 @@ private[v1] class PrometheusResource extends 
ApiRequestContext {
   def executors(): String = {
 val sb = new StringBuilder
 val store = uiRoot.asInstanceOf[SparkUI].store
-val appId = store.applicationInfo.id.replaceAll("[^a-zA-Z0-9]", "_")
 store.executorList(true).foreach { executor =>
-  val prefix = s"metrics_${appId}_${executor.id}_executor_"
-  sb.append(s"${prefix}rddBlocks_Count ${executor.rddBlocks}\n")
-  sb.append(s"${prefix}memoryUsed_Count ${executor.memoryUsed}\n")
-  sb.append(s"${prefix}diskUsed_Count ${executor.diskUsed}\n")
-  sb.append(s"${prefix}totalCores_Count ${executor.totalCores}\n")
-  sb.append(s"${prefix}maxTasks_Count ${executor.maxTasks}\n")
-  sb.append(s"${prefix}activeTasks_Count ${executor.activeTasks}\n")
-  sb.append(s"${prefix}failedTasks_Count ${executor.failedTasks}\n")
-  sb.append(s"${prefix}completedTasks_Count ${executor.completedTasks}\n")
-  sb.append(s"${prefix}totalTasks_Count ${executor.totalTasks}\n")
-  sb.append(s"${prefix}totalDuration_Value ${executor.totalDuration}\n")
-  sb.append(s"${prefix}totalGCTime_Value ${executor.totalGCTime}\n")
-  sb.append(s"${prefix}totalInputBytes_Count 
${executor.totalInputBytes}\n")
-  sb.append(s"${prefix}totalShuffleRead_Count 
${executor.totalShuffleRead}\n")
-  sb.append(s"${prefix}totalShuffleWrite_Count 
${executor.totalShuffleWrite}\n")
-  sb.append(s"${prefix}maxMemory_Count ${executor.maxMemory}\n")
+  val prefix = "metrics_executor_"
+  val labels = Seq(
+"application_id" -> store.applicationInfo.id,
+"application_name" -> store.applicationInfo.name,
 
 Review comment:
   I am not sure if application name is needed, because you have application id 
already.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #26077: [SPARK-29413][CORE][FOLLOWUP] A better parmap method signature

2019-10-09 Thread GitBox

srowen commented on a change in pull request #26077: 
[SPARK-29413][CORE][FOLLOWUP] A better parmap method signature
URL: https://github.com/apache/spark/pull/26077#discussion_r333254145
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -274,12 +274,19 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
+  def parmap[I, O, Col[X] <: TraversableOnce[X]]
 
 Review comment:
   Oh, I should be clear: `TraversableOnce` also doesn't exist in 2.13.
   I don't see the use case for returning the same collection, not in this 
limited utility method?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

srowen commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-540222491
 
 
   (I don't have any opinion on this - don't know Prometheus)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve PrometheusResource to use labels

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26060: [SPARK-29400][CORE] Improve 
PrometheusResource to use labels
URL: https://github.com/apache/spark/pull/26060#issuecomment-540218055
 
 
   Gentle ping, @gatorsmile , @srowen , @dbtsai , @viirya , @yuecong .
   
   **Brief Summary**
   1. Apache Spark has been providing a REST API. The new Prometheus servlet is 
only exposing the same information. If there is a scalability issue inside 
Apache Spark, we should fix it for both the existing REST API and this. That is 
orthogonal to this PR.
   2. Prometheus (performance or architectural) issues are completely 
irrelevant to this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] abellina opened a new pull request #26078: SPARK-29151: Support fractional resources for task resource scheduling

2019-10-09 Thread GitBox

abellina opened a new pull request #26078: SPARK-29151: Support fractional 
resources for task resource scheduling
URL: https://github.com/apache/spark/pull/26078
 
 
   
   
   ### What changes were proposed in this pull request?
   This PR adds the ability for tasks to request fractional resources, in order 
to be able to execute more than 1 task per resource. For example, if you have 1 
GPU in the executor, and the task configuration is 0.5 GPU/task, the executor 
can schedule two tasks to run on that 1 GPU.
   
   ### Why are the changes needed?
   Currently there is no good way to share a resource such that multiple tasks 
can run on a single unit. This allows multiple tasks to share an executor 
resource.
   
   ### Does this PR introduce any user-facing change?
   Yes: There is a configuration change where `spark.task.resource.[resource 
type].amount` can now be fractional.
   
   ### How was this patch tested?
   Unit tests and manually on standalone mode, and yarn.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

zsxwing commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333249219
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   This one works for 2.12: https://github.com/apache/spark/pull/26077 Could 
you try it with Scala 2.13?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on issue #26077: [SPARK-29413][CORE][FOLLOWUP] A better parmap method signature

2019-10-09 Thread GitBox

zsxwing commented on issue #26077: [SPARK-29413][CORE][FOLLOWUP] A better 
parmap method signature
URL: https://github.com/apache/spark/pull/26077#issuecomment-540216275
 
 
   @srowen could you help test whether this compiles with Scala 2.13?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing opened a new pull request #26077: [SPARK-29413][CORE] A better parmap method signature

2019-10-09 Thread GitBox

zsxwing opened a new pull request #26077: [SPARK-29413][CORE] A better parmap 
method signature
URL: https://github.com/apache/spark/pull/26077
 
 
   ### What changes were proposed in this pull request?
   
   This PR updates the `parmap` method signature to return the same type of 
`input` to make the method more flexible.
   
   ### Why are the changes needed?
   
   `parmap` will be more flexible similar to `Future.sequence`.
   
   ### Does this PR introduce any user-facing change?
   
   No. This PR updates an internal method.
   
   ### How was this patch tested?
   
   The new test added in this PR.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen edited a comment on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety bug in createDataset(Seq)

2019-10-09 Thread GitBox

JoshRosen edited a comment on issue #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety bug in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540211354
 
 
   > But if we would like to fix all these problems, all public APIs accepting 
`Encoder` will need the copy.
   
   I think that most existing uses of Encoders are de-facto thread-safe because 
either (a) the use occurs inside of a Spark task and task gets its own fresh 
copy when the `Task` is deserialized or (b) the use occurs on the driver but 
the code calls call `resolveAndBind` (which internally performs a `copy`) prior 
to using the Encoder.
   
   Given this, I suspect that this might be the only non-thread-safe Encoder 
usage in Spark (excluding code which is only used in Spark's unit tests). I 
don't think that we need to introduce similar copying in other public APIs.
   
   > I did some research about this and found some noticeable performance 
regression in our internal benchmark.
   
   What do you think about improving the performance / reducing the cost of 
`.copy()` by refactoring the `ExpressionEncoder` class such that (a) all of the 
immutable `vals` become fields of the case class, (b) the current constructor 
becomes a `.apply()` on the companion object and the case class constructor 
becomes `private`, and (c) `resolveAndBind` calls the companion object 
constructor instead of `copy()`? Given this, I think `copy()` could be _really_ 
cheap, effectively giving us a fresh copy of the internal mutable state but 
copying all other immutable attributes without performing any re-resolution, 
analysis, attribute binding, etc.
   
   If we do that, we'd be able to defensively copy at _very_ low cost (e.g. one 
object allocation) and then could copy-by-default and free users from having to 
worry about thread-safety.
   
   I think that's a potentially huge win from a developer productivity 
point-of-view: the cost / toil of having to worry about thread-unsafe code is a 
tax placed on end users and creates a developer education / training burden, so 
I think it's worth attempting to eliminate this entire class of pitfall.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen edited a comment on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety bug in createDataset(Seq)

2019-10-09 Thread GitBox

JoshRosen edited a comment on issue #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety bug in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540211354
 
 
   > But if we would like to fix all these problems, all public APIs accepting 
`Encoder` will need the copy.
   
   I think that most existing uses of Encoders are de-facto thread-safe because 
either (a) the use occurs inside of a Spark task and task gets its own fresh 
copy of the Encoder when the `Task` is deserialized or (b) the use occurs on 
the driver but the code calls call `resolveAndBind` (which internally performs 
a `copy`) prior to using the Encoder.
   
   Given this, I suspect that this might be the only non-thread-safe Encoder 
usage in Spark (excluding code which is only used in Spark's unit tests). I 
don't think that we need to introduce similar copying in other public APIs.
   
   > I did some research about this and found some noticeable performance 
regression in our internal benchmark.
   
   What do you think about improving the performance / reducing the cost of 
`.copy()` by refactoring the `ExpressionEncoder` class such that (a) all of the 
immutable `vals` become fields of the case class, (b) the current constructor 
becomes a `.apply()` on the companion object and the case class constructor 
becomes `private`, and (c) `resolveAndBind` calls the companion object 
constructor instead of `copy()`? Given this, I think `copy()` could be _really_ 
cheap, effectively giving us a fresh copy of the internal mutable state but 
copying all other immutable attributes without performing any re-resolution, 
analysis, attribute binding, etc.
   
   If we do that, we'd be able to defensively copy at _very_ low cost (e.g. one 
object allocation) and then could copy-by-default and free users from having to 
worry about thread-safety.
   
   I think that's a potentially huge win from a developer productivity 
point-of-view: the cost / toil of having to worry about thread-unsafe code is a 
tax placed on end users and creates a developer education / training burden, so 
I think it's worth attempting to eliminate this entire class of pitfall.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen commented on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety bug in createDataset(Seq)

2019-10-09 Thread GitBox

JoshRosen commented on issue #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety bug in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540211354
 
 
   > But if we would like to fix all these problems, all public APIs accepting 
`Encoder` will need the copy.
   
   I think that most existing uses of Encoders are thread-safe because either 
(a) the use occurs inside of a Spark task and task gets its own fresh copy when 
the `Task` is deserialized or (b) the use occurs on the driver but the code 
calls call `resolveAndBind` (which internally performs a `copy`) prior to using 
the Encoder.
   
   Given this, I suspect that this might be the only non-thread-safe Encoder 
usage in Spark (excluding code which is only used in Spark's unit tests). I 
don't think that we need to introduce similar copying in other public APIs.
   
   > I did some research about this and found some noticeable performance 
regression in our internal benchmark.
   
   What do you think about improving the performance / reducing the cost of 
`.copy()` by refactoring the `ExpressionEncoder` class such that (a) all of the 
immutable `vals` become fields of the case class, (b) the current constructor 
becomes a `.apply()` on the companion object and the case class constructor 
becomes `private`, and (c) `resolveAndBind` calls the companion object 
constructor instead of `copy()`? Given this, I think `copy()` could be _really_ 
cheap, effectively giving us a fresh copy of the internal mutable state but 
copying all other immutable attributes without performing any re-resolution, 
analysis, attribute binding, etc.
   
   If we do that, we'd be able to defensively copy at _very_ low cost (e.g. one 
object allocation) and then could copy-by-default and free users from having to 
worry about thread-safety.
   
   I think that's a potentially huge win from a developer productivity 
point-of-view: the cost / toil of having to worry about thread-unsafe code is a 
tax placed on end users and creates a developer education / training burden, so 
I think it's worth attempting to eliminate this entire class of pitfall.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on a change in pull request #26041: [SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now

2019-10-09 Thread GitBox

BryanCutler commented on a change in pull request #26041: 
[SPARK-29403][INFRA][R] Uses Arrow R 0.14.1 in AppVeyor for now
URL: https://github.com/apache/spark/pull/26041#discussion_r333241522
 
 

 ##
 File path: appveyor.yml
 ##
 @@ -57,6 +57,8 @@ build_script:
 
 environment:
   NOT_CRAN: true
+  # See SPARK-29378.
+  ARROW_PRE_0_15_IPC_FORMAT: 1
 
 Review comment:
   Does setting this here carry it through to the worker?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on issue #24896: [WIP][SPARK-28006] User-defined grouped transform pandas_udf for window operations

2019-10-09 Thread GitBox

BryanCutler commented on issue #24896: [WIP][SPARK-28006] User-defined grouped 
transform pandas_udf for window operations
URL: https://github.com/apache/spark/pull/24896#issuecomment-540198984
 
 
   @icexelloss I'm a little confused about a couple things
   
   1) is this specific to operations over a window only?
   2) can the same functionality be achieved by the `GROUPED_MAP` pandas_udf, 
only it is slower and awkward because it requires the whole dataframe even 
though only 1 column is used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

srowen commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333234146
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   Heh, no, that's the hard part here. If you `./dev/change-scala-version.sh 
2.13` after modifying it to accept that value, and tweak a few more things, and 
use `-Pscala-2.13` you can at least get to the point where there are still tons 
of compile errors including this one. There are a few things we can't fix now 
(i.e. deps that don't support 2.13) and are hard (i.e. things that will require 
multiple source trees). I'm fixing everything short of that that I can.
   
   If you can write anything else in 2.12 that doesn't use TraversableLike, I 
can try it in my hacked up local build.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on issue #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-09 Thread GitBox

squito commented on issue #25943: [WIP][SPARK-29261][SQL][CORE] Support recover 
live entities from KVStore for (SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#issuecomment-540197120
 
 
   Sorry I am getting caught up on a lot of stuff here -- how is this related 
to https://github.com/apache/spark/pull/25577 ? 
   
   They seem to be two different approaches to the same problem (though 
https://github.com/apache/spark/pull/25577 does a little bit more than just 
what is here).  Just for storing live entities, this approach seems more 
promising to me -- using Java serialization is really bad for compatibility, 
any changes to classes will break deserialization.  If we want the snapshot / 
checkpoint files to be able to entirely replace the event log files, we'll need 
the compatibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

zsxwing commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333232018
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   @srowen I can give it a try. Does our build support 2.13 now?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #26057: [SPARK-29377][PYTHON][ML] Parity between Scala ML tuning and Python ML tuning

2019-10-09 Thread GitBox

huaxingao commented on a change in pull request #26057: 
[SPARK-29377][PYTHON][ML] Parity between Scala ML tuning and Python ML tuning
URL: https://github.com/apache/spark/pull/26057#discussion_r333230868
 
 

 ##
 File path: python/pyspark/ml/tuning.py
 ##
 @@ -199,7 +181,25 @@ def _to_java_impl(self):
 return java_estimator, java_epms, java_evaluator
 
 
-class CrossValidator(Estimator, ValidatorParams, HasParallelism, 
HasCollectSubModels,
+class _CrossValidatorParams(_ValidatorParams):
 
 Review comment:
   I don't think adding leading underscore will affect users to extend these 
classes. The single leading underscore before a class name is only a weak 
indicator for internal usage. It doesn't enforce privacy. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on a change in pull request #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide

2019-10-09 Thread GitBox

BryanCutler commented on a change in pull request #26045: [SPARK-29367][DOC] 
Add compatibility note for Arrow 0.15.0 to SQL guide
URL: https://github.com/apache/spark/pull/26045#discussion_r333229971
 
 

 ##
 File path: docs/sql-pyspark-pandas-with-arrow.md
 ##
 @@ -219,3 +219,14 @@ Note that a standard UDF (non-Pandas) will load timestamp 
data as Python datetim
 different than a Pandas timestamp. It is recommended to use Pandas time series 
functionality when
 working with timestamps in `pandas_udf`s to get the best performance, see
 [here](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) for 
details.
+
+### Compatibiliy Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x
 
 Review comment:
   @HyukjinKwon so you don't need this note for R, Arrow was not used in 2.4.x?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on issue #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide

2019-10-09 Thread GitBox

BryanCutler commented on issue #26045: [SPARK-29367][DOC] Add compatibility 
note for Arrow 0.15.0 to SQL guide
URL: https://github.com/apache/spark/pull/26045#issuecomment-540191957
 
 
   Updated to add some clarification and links to JIRA showing the error 
message if the env var is not set and link to the Arrow blog.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

srowen commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333228861
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   Sure, but, do I necessarily have to get a List from a List? Seq -> Seq seems 
fine for all current usages.
   That said, if there's any other way to write this that works in 2.12 and 
2.13, definitely, open to that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety bug in createDataset(Seq)

2019-10-09 Thread GitBox

zsxwing commented on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety 
bug in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540191322
 
 
   > the caller-supplied `Encoder` is used in multiple threads
   
   I would say this is a user error. But I agree that it's pretty easy to 
overlook. IMO, I made such mistake when I first used `Encoder`. But if we would 
like to fix all these problems, all public APIs accepting `Encoder` will need 
the copy. I did some research about this and found some noticeable performance 
regression in our internal benchmark. That's why I finally just submitted 
https://github.com/apache/spark/pull/25209 to make users easy to copy an 
`Encoder` instead. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen edited a comment on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety bug in createDataset(Seq)

2019-10-09 Thread GitBox

JoshRosen edited a comment on issue #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety bug in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540186939
 
 
   ~~I'll submit a separate patch for 2.4.x.~~
   
   Actually, this is a clean merge with 2.4.x, so we can merge this PR to both 
branches.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen commented on issue #26076: [SPARK-29419][SQL] Fix Encoder thread-safety issue in createDataset(Seq)

2019-10-09 Thread GitBox

JoshRosen commented on issue #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety issue in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076#issuecomment-540186939
 
 
   I'll submit a separate patch for 2.4.x.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] JoshRosen opened a new pull request #26076: [SPARK-29419][SQL] Fix Encoder thread-safety issue in createDataset(Seq)

2019-10-09 Thread GitBox

JoshRosen opened a new pull request #26076: [SPARK-29419][SQL] Fix Encoder 
thread-safety issue in createDataset(Seq)
URL: https://github.com/apache/spark/pull/26076
 
 
   
   
   ### What changes were proposed in this pull request?
   
   
   This PR fixes a thread-safety bug in `SparkSession.createDataset(Seq)`: if 
the caller-supplied `Encoder` is used in multiple threads then createDataset's 
usage of the encoder may lead to incorrect / corrupt results because the 
Encoder's internal mutable state will be updated from multiple threads.
   
   Here is an example demonstrating the problem:
   
   ```scala
   import org.apache.spark.sql._
   
   val enc = implicitly[Encoder[(Int, Int)]]
   
   val datasets = (1 to 100).par.map { _ =>
 val pairs = (1 to 100).map(x => (x, x))
 spark.createDataset(pairs)(enc)
   }
   
   datasets.reduce(_ union _).collect().foreach {
 pair => require(pair._1 == pair._2, s"Pair elements are mismatched: $pair")
   }
   ```
   
   Before this PR's change, the above example fails because Spark produces 
corrupted records where different input records' fields have been co-mingled.
   
   This bug is similar to SPARK-22355 / #19577, a similar problem in 
`Dataset.collect()`.
   
   The fix implemented here is based on #24735's updated version of the 
`Datataset.collect()` bugfix: use `.copy()`. For consistency, I used same [code 
comment](https://github.com/apache/spark/blob/d841b33ba3a9b0504597dbccd4b0d11fa810abf3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L3414)
 / explanation as that PR.
   
   ### Does this PR introduce any user-facing change?
   
   
   No.
   
   
   ### How was this patch tested?
   
   
   Tested manually using the example listed above.
   
   
   ---
   
   Thanks to @smcnamara-stripe for identifying this bug.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait process exiting before output iterator finish

2019-10-09 Thread GitBox

viirya commented on a change in pull request #21164: [SPARK-24098][SQL] 
ScriptTransformationExec should wait process exiting before output iterator 
finish
URL: https://github.com/apache/spark/pull/21164#discussion_r333223463
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##
 @@ -137,13 +137,12 @@ case class ScriptTransformationExec(
 throw writerThread.exception.get
   }
 
-  if (!proc.isAlive) {
-val exitCode = proc.exitValue()
-if (exitCode != 0) {
-  logError(stderrBuffer.toString) // log the stderr circular buffer
-  throw new SparkException(s"Subprocess exited with status 
$exitCode. " +
-s"Error: ${stderrBuffer.toString}", cause)
-}
+  proc.waitFor()
 
 Review comment:
   I tested with provided python script. Actually at 
ScriptTransformationWriterThread, no exception is thrown. 
   
   And, checkFailureAndPropagate is called not due to EOFException, it is 
called at:
   
   
https://github.com/apache/spark/blob/8a17d26784c53fb50b6373b566aab71135c8956f/sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala#L163-L167
   
   In this case, checkFailureAndPropagate can be called before proc is being 
terminated due to possible lag, so the !proc.isAlive check fails.
   
   I think the fix is to add proc.waitFor() before other 
checkFailureAndPropagate calls.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on a change in pull request #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide

2019-10-09 Thread GitBox

BryanCutler commented on a change in pull request #26045: [SPARK-29367][DOC] 
Add compatibility note for Arrow 0.15.0 to SQL guide
URL: https://github.com/apache/spark/pull/26045#discussion_r333215694
 
 

 ##
 File path: docs/sql-pyspark-pandas-with-arrow.md
 ##
 @@ -219,3 +219,14 @@ Note that a standard UDF (non-Pandas) will load timestamp 
data as Python datetim
 different than a Pandas timestamp. It is recommended to use Pandas time series 
functionality when
 working with timestamps in `pandas_udf`s to get the best performance, see
 [here](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) for 
details.
+
+### Compatibiliy Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x
+
+Since Arrow 0.15.0, a change in the binary IPC format requires an environment 
variable to be set in
 
 Review comment:
   Sure, that would be good thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] BryanCutler commented on a change in pull request #26045: [SPARK-29367][DOC] Add compatibility note for Arrow 0.15.0 to SQL guide

2019-10-09 Thread GitBox

BryanCutler commented on a change in pull request #26045: [SPARK-29367][DOC] 
Add compatibility note for Arrow 0.15.0 to SQL guide
URL: https://github.com/apache/spark/pull/26045#discussion_r333214127
 
 

 ##
 File path: docs/sql-pyspark-pandas-with-arrow.md
 ##
 @@ -219,3 +219,14 @@ Note that a standard UDF (non-Pandas) will load timestamp 
data as Python datetim
 different than a Pandas timestamp. It is recommended to use Pandas time series 
functionality when
 working with timestamps in `pandas_udf`s to get the best performance, see
 [here](https://pandas.pydata.org/pandas-docs/stable/timeseries.html) for 
details.
+
+### Compatibiliy Setting for PyArrow >= 0.15.0 and Spark 2.3.x, 2.4.x
+
+Since Arrow 0.15.0, a change in the binary IPC format requires an environment 
variable to be set in
+Spark so that PySpark maintain compatibility with versions on PyArrow 0.15.0 
and above. The following can be added to `conf/spark-env.sh` to use the legacy 
IPC format:
+
+```
+ARROW_PRE_0_15_IPC_FORMAT=1
 
 Review comment:
   Yes I agree, I'll clarify this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333206520
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   It seems that Apache Spark doesn't have those use cases yet. Was it for the 
future-proof?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333206520
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   It seems that Apache Spark doesn't have those use cases yet. Was it for the 
future proof?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-09 Thread GitBox

squito commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333194793
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/api/v1/api.scala
 ##
 @@ -181,6 +182,7 @@ class RDDStorageInfo private[spark](
 val partitions: Option[Seq[RDDPartitionInfo]])
 
 class RDDDataDistribution private[spark](
+val executorId: String,
 
 Review comment:
   the api versioning requirements are listed here: 
https://spark.apache.org/docs/latest/monitoring.html#api-versioning-policy
   
   > New fields may be added to existing endpoints
   
   so its OK to add new fields.
   
   but, if we're adding things that really don't make sense as part of the api, 
then maybe we should just store a different object instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #25943: [WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for (SQL)AppStatusListener

2019-10-09 Thread GitBox

squito commented on a change in pull request #25943: 
[WIP][SPARK-29261][SQL][CORE] Support recover live entities from KVStore for 
(SQL)AppStatusListener
URL: https://github.com/apache/spark/pull/25943#discussion_r333195040
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/status/api/v1/api.scala
 ##
 @@ -192,14 +194,34 @@ class RDDDataDistribution private[spark](
 @JsonDeserialize(contentAs = classOf[JLong])
 val onHeapMemoryRemaining: Option[Long],
 @JsonDeserialize(contentAs = classOf[JLong])
-val offHeapMemoryRemaining: Option[Long])
+val offHeapMemoryRemaining: Option[Long]) {
+
+  def toLiveRDDDistribution(executors: scala.collection.Map[String, 
LiveExecutor])
+  : LiveRDDDistribution = {
 
 Review comment:
   these added methods should probably be `private[spark]`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yifeih commented on a change in pull request #25823: [SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API

2019-10-09 Thread GitBox

yifeih commented on a change in pull request #25823: 
[SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API
URL: https://github.com/apache/spark/pull/25823#discussion_r333181721
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -524,6 +529,12 @@ class SparkContext(config: SparkConf) extends Logging {
 executorEnvs ++= _conf.getExecutorEnv
 executorEnvs("SPARK_USER") = sparkUser
 
+val redactionRegex = _conf.get(SECRET_REDACTION_PATTERN)
 
 Review comment:
   ah right, will fix that now


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #26000: [SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service

2019-10-09 Thread GitBox

tgravescs commented on a change in pull request #26000: 
[SPARK-29330][CORE][YARN] Allow users to chose the name of Spark Shuffle service
URL: https://github.com/apache/spark/pull/26000#discussion_r333181357
 
 

 ##
 File path: docs/running-on-yarn.md
 ##
 @@ -492,6 +492,13 @@ To use a custom metrics.properties for the application 
master and executors, upd
   If it is not set then the YARN application ID is used.
   
 
+
+  spark.yarn.shuffle.service.name
+  spark_shuffle
+  
+Name of the external shuffle service.
 
 Review comment:
   I think we need more description here. This isn't setting what the service 
runs as, you have to configure that via yarn, this is what executors use for 
external shuffle service name when launching the container.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] squito commented on a change in pull request #25823: [SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API

2019-10-09 Thread GitBox

squito commented on a change in pull request #25823: 
[SPARK-28211][Core][Shuffle] Propose Shuffle Driver Components API
URL: https://github.com/apache/spark/pull/25823#discussion_r333178354
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/SparkContext.scala
 ##
 @@ -524,6 +529,12 @@ class SparkContext(config: SparkConf) extends Logging {
 executorEnvs ++= _conf.getExecutorEnv
 executorEnvs("SPARK_USER") = sparkUser
 
+val redactionRegex = _conf.get(SECRET_REDACTION_PATTERN)
 
 Review comment:
   oops, this isn't needed anymore


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

zsxwing commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333177093
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
-  (in: Col[I], prefix: String, maxThreads: Int)
-  (f: I => O)
-  (implicit
-cbf: CanBuildFrom[Col[I], Future[O], Col[Future[O]]], // For in.map
-cbf2: CanBuildFrom[Col[Future[O]], O, Col[O]] // for Future.sequence
-  ): Col[O] = {
+  def parmap[I, O](in: Seq[I], prefix: String, maxThreads: Int)(f: I => O): 
Seq[O] = {
 
 Review comment:
   This signature is much less flexible. It always returns `Seq`. It forces us 
to use `asInstanceOf` like this:
   ```
   val l = List(1, 2, 3)
   l2 = parmap(l, )(...).asInstanceOf[List[...]]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on issue #26075: [WIP][K8S] Spark operator

2019-10-09 Thread GitBox

holdenk commented on issue #26075: [WIP][K8S] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540134447
 
 
   Yeah idk if we want this in Spark it's self. The dev@ mailing list is 
probably the right place to raise this discussion. There are a lot hard coded 
things in here we would need to change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun removed a comment on issue #26075: [WIP] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun removed a comment on issue #26075: [WIP] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540130660
 
 
   Hi, @jkremser . Thank you for making a PR.
   
   1. Please follow the PR template. The template is for the community, but 
this PR intentionally ignore it. The screencast should be an additional 
attachment for that.
   2. Is this the following? Please give a clear explanation if this is your 
own contribution.
   - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
   3. Could you send an email to `d...@spark.apache.org` about this and why 
this should be inside `Apache Spark` project?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26075: [WIP] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26075: [WIP] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540130738
 
 
   cc @holdenk 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26075: [WIP] Spark operator

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26075: [WIP] Spark operator
URL: https://github.com/apache/spark/pull/26075#issuecomment-540130660
 
 
   Hi, @jkremser . Thank you for making a PR.
   
   1. Please follow the PR template. The template is for the community, but 
this PR intentionally ignore it. The screencast should be an additional 
attachment for that.
   2. Is this the following? Please give a clear explanation if this is your 
own contribution.
   - https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
   3. Could you send an email to `d...@spark.apache.org` about this and why 
this should be inside `Apache Spark` project?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #26073: [SPARK-29416][CORE][ML][SQL][MESOS] Use .sameElements to compare arrays, instead of .deep (gone in 2.13)

2019-10-09 Thread GitBox

srowen commented on a change in pull request #26073: 
[SPARK-29416][CORE][ML][SQL][MESOS] Use .sameElements to compare arrays, 
instead of .deep (gone in 2.13)
URL: https://github.com/apache/spark/pull/26073#discussion_r333165360
 
 

 ##
 File path: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala
 ##
 @@ -189,6 +189,6 @@ object ChiSqSelectorSuite extends SparkFunSuite {
   }
 
   def checkEqual(a: ChiSqSelectorModel, b: ChiSqSelectorModel): Unit = {
-assert(a.selectedFeatures.deep == b.selectedFeatures.deep)
+assert(a.selectedFeatures.sameElements(b.selectedFeatures)
 
 Review comment:
   Ugh I messed up something translating between my two builds. Will fix


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] Replace use of Unit object with () for Scala 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] 
Replace use of Unit object with () for Scala 2.13
URL: https://github.com/apache/spark/pull/26070#issuecomment-540128065
 
 
   The above comment on the `GitHub Action` is an explaination on the current 
outage of `GitHub Action`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] Replace use of Unit object with () for Scala 2.13

2019-10-09 Thread GitBox

dongjoon-hyun edited a comment on issue #26070: 
[SPARK-29411][CORE][ML][SQL][DSTREAM] Replace use of Unit object with () for 
Scala 2.13
URL: https://github.com/apache/spark/pull/26070#issuecomment-540127530
 
 
   @srowen . `GitHub Action` is just optional infra. I ran UTs locally 
(core/ml/stream/catalyst and some `sql`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] Replace use of Unit object with () for Scala 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] 
Replace use of Unit object with () for Scala 2.13
URL: https://github.com/apache/spark/pull/26070#issuecomment-540127530
 
 
   @srowen . `GitHub Action` is just optional infra. I ran UTs local 
(core/ml/stream/catalyst and some `sql`).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jkremser opened a new pull request #26075: [WIP] Spark operator

2019-10-09 Thread GitBox

jkremser opened a new pull request #26075: [WIP] Spark operator
URL: https://github.com/apache/spark/pull/26075
 
 
   
   
![ascii](https://user-images.githubusercontent.com/535866/66508719-f3271500-ead1-11e9-94e1-e3100e5f1d39.gif)
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26073: [SPARK-29416][CORE][ML][SQL][MESOS] Use .sameElements to compare arrays, instead of .deep (gone in 2.13)

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26073: 
[SPARK-29416][CORE][ML][SQL][MESOS] Use .sameElements to compare arrays, 
instead of .deep (gone in 2.13)
URL: https://github.com/apache/spark/pull/26073#discussion_r333160247
 
 

 ##
 File path: 
mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala
 ##
 @@ -189,6 +189,6 @@ object ChiSqSelectorSuite extends SparkFunSuite {
   }
 
   def checkEqual(a: ChiSqSelectorModel, b: ChiSqSelectorModel): Unit = {
-assert(a.selectedFeatures.deep == b.selectedFeatures.deep)
+assert(a.selectedFeatures.sameElements(b.selectedFeatures)
 
 Review comment:
   Hi, @srowen . This is a compile error~
   ```
   [error] 
/Users/dongjoon/PRS/PR-26073/mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala:193:
 ')' expected but '}' found.
   [error]   }
   [error]   ^
   [error] one error found
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #21164: [SPARK-24098][SQL] ScriptTransformationExec should wait process exiting before output iterator finish

2019-10-09 Thread GitBox

viirya commented on a change in pull request #21164: [SPARK-24098][SQL] 
ScriptTransformationExec should wait process exiting before output iterator 
finish
URL: https://github.com/apache/spark/pull/21164#discussion_r333158224
 
 

 ##
 File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
 ##
 @@ -174,7 +173,6 @@ case class ScriptTransformationExec(
 // Ideally the proc should *not* be alive at this point but
 // there can be a lag between EOF being written out and 
the process
 // being terminated. So explicitly waiting for the process 
to be done.
-proc.waitFor()
 
 Review comment:
   I do not think we should remove proc.waitFor() here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

dongjoon-hyun closed pull request #26072: [SPARK-29413][CORE] Rewrite 
ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on issue #26072: [SPARK-29413][CORE] Rewrite 
ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#issuecomment-540114875
 
 
   Merged to master. Thank you, @srowen and @MaxGekk .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333141589
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
 
 Review comment:
   I agree with @srowen .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] Replace use of Unit object with () for Scala 2.13

2019-10-09 Thread GitBox

srowen commented on issue #26070: [SPARK-29411][CORE][ML][SQL][DSTREAM] Replace 
use of Unit object with () for Scala 2.13
URL: https://github.com/apache/spark/pull/26070#issuecomment-540114299
 
 
   Jenkins tests didn't run. While I'm quite sure that this isn't even a change 
except at the source level, you think it's ok to go on the compile result from 
GitHub actions? Could be reasonable in a case like this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13

2019-10-09 Thread GitBox

dongjoon-hyun commented on a change in pull request #26072: [SPARK-29413][CORE] 
Rewrite ThreadUtils.parmap to avoid TraversableLike, gone in 2.13
URL: https://github.com/apache/spark/pull/26072#discussion_r333141589
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/ThreadUtils.scala
 ##
 @@ -275,13 +274,7 @@ private[spark] object ThreadUtils {
* @return new collection in which each element was given from the input 
collection `in` by
* applying the lambda function `f`.
*/
-  def parmap[I, O, Col[X] <: TraversableLike[X, Col[X]]]
 
 Review comment:
   I agree with @srowen , but shall we ask @zsxwing to make it sure?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 219 matches

Mail list logo