date:20170912

[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-12 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/19068
  
@cloud-fan Yeah.. I have tried my script against this PR and it works fine. 
I am not familiar with the changes and don't know if it can have any side 
effects.  One thing that haven't had the time to find out is in my script ..
```
1) spark-sql
create database testdb;
2) exit spark-sql
3) spark-sql
use testdb;  => I get a database not found error.
```
How did the create database succeed i.e i didn't get any error ? If it did, 
then where did it create the database at ? Perhaps you know the answer :-)



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19132
  
**[Test build #81709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81709/testReport)**
 for PR 19132 at commit 
[`560d442`](https://github.com/apache/spark/commit/560d442a8d25e37f9d831699663b1c8413ddd6a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19068
  
> but the Hive configurations generated here just points to a dummy meta 
store

Why did this works well before?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19068
  
OK to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19068
  
@dilipbiswal does your script work after this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19132: [SPARK-21922] Fix duration always updating when t...

2017-09-12 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19132#discussion_r138526050
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -142,7 +142,7 @@ private[v1] object AllStagesResource {
   index = uiData.taskInfo.index,
   attempt = uiData.taskInfo.attemptNumber,
   launchTime = new Date(uiData.taskInfo.launchTime),
-  duration = uiData.taskDuration,
+  duration = uiData.taskDuration(),
--- End diff --

Yes, if it is not a big change I think it should be fixed here. Because 
currently with this fix UI and REST API are inconsistent.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18975#discussion_r138525454
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveDirCommand.scala
 ---
@@ -0,0 +1,131 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import scala.language.existentials
+
+import org.apache.hadoop.fs.{FileSystem, Path}
+import org.apache.hadoop.hive.common.FileUtils
+import org.apache.hadoop.hive.ql.plan.TableDesc
+import org.apache.hadoop.hive.serde.serdeConstants
+import org.apache.hadoop.hive.serde2.`lazy`.LazySimpleSerDe
+import org.apache.hadoop.mapred._
+
+import org.apache.spark.SparkException
+import org.apache.spark.sql.{Row, SparkSession}
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.execution.SparkPlan
+import org.apache.spark.sql.hive.client.HiveClientImpl
+
+/**
+ * Command for writing the results of `query` to file system.
+ *
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   INSERT OVERWRITE [LOCAL] DIRECTORY
+ *   path
+ *   [ROW FORMAT row_format]
+ *   [STORED AS file_format]
+ *   SELECT ...
+ * }}}
+ *
+ * @param isLocal whether the path specified in `storage` is a local 
directory
+ * @param storage storage format used to describe how the query result is 
stored.
+ * @param query the logical plan representing data to write to
+ * @param overwrite whether overwrites existing directory
+ */
+case class InsertIntoHiveDirCommand(
+isLocal: Boolean,
+storage: CatalogStorageFormat,
+query: LogicalPlan,
+overwrite: Boolean) extends SaveAsHiveFile with HiveTmpPath {
--- End diff --

why do we separate `SaveAsHiveFile` and `HiveTmpPath`, while we always use 
them together?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81706/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19214
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138525087
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/describe-table-column.sql.out ---
@@ -0,0 +1,184 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 15
+
+
+-- !query 0
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+DESC desc_col_temp_table key
+-- !query 1 schema
+struct
+-- !query 1 output
+col_name   key
+data_type  int
+commentcolumn_comment
+
+
+-- !query 2
+DESC EXTENDED desc_col_temp_table key
+-- !query 2 schema
+struct
+-- !query 2 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 3
+DESC FORMATTED desc_col_temp_table key
+-- !query 3 schema
+struct
+-- !query 3 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 4
+DESC FORMATTED desc_col_temp_table desc_col_temp_table.key
+-- !query 4 schema
+struct
+-- !query 4 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
+num_nulls  NULL
+distinct_count NULL
+avg_col_lenNULL
+max_col_lenNULL
+
+
+-- !query 5
+DESC desc_col_temp_table key1
+-- !query 5 schema
+struct<>
+-- !query 5 output
+org.apache.spark.sql.AnalysisException
+Column key1 does not exist;
+
+
+-- !query 6
+CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING 
PARQUET
+-- !query 6 schema
+struct<>
+-- !query 6 output
+
+
+
+-- !query 7
+ANALYZE TABLE desc_col_table COMPUTE STATISTICS FOR COLUMNS key
+-- !query 7 schema
+struct<>
+-- !query 7 output
+
+
+
+-- !query 8
+DESC desc_col_table key
+-- !query 8 schema
+struct
+-- !query 8 output
+col_name   key
+data_type  int
+commentcolumn_comment
+
+
+-- !query 9
+DESC EXTENDED desc_col_table key
+-- !query 9 schema
+struct
+-- !query 9 output
+col_name   key
+data_type  int
+commentcolumn_comment
+minNULL
+maxNULL
--- End diff --

why min max is NULL?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19214
  
**[Test build #81706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81706/testReport)**
 for PR 19214 at commit 
[`afd6dc2`](https://github.com/apache/spark/commit/afd6dc2eff32aaadde2d4d1147994b9d8f2b7285).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138524974
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/describe-table-column.sql ---
@@ -0,0 +1,35 @@
+-- Test temp table
+CREATE TEMPORARY VIEW desc_col_temp_table (key int COMMENT 
'column_comment') USING PARQUET;
+
+DESC desc_col_temp_table key;
+
+DESC EXTENDED desc_col_temp_table key;
+
+DESC FORMATTED desc_col_temp_table key;
+
+-- Describe a column with qualified name
+DESC FORMATTED desc_col_temp_table desc_col_temp_table.key;
+
+-- Describe a non-existent column
+DESC desc_col_temp_table key1;
+
+-- Test persistent table
+CREATE TABLE desc_col_table (key int COMMENT 'column_comment') USING 
PARQUET;
--- End diff --

shall we drop these testing tables at the end?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81707 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81707/testReport)**
 for PR 19122 at commit 
[`7122884`](https://github.com/apache/spark/commit/712288441482348c7f58427d42d4948b89f1df3a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81707/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19188
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81702/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19188
  
**[Test build #81702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81702/testReport)**
 for PR 19188 at commit 
[`be1a199`](https://github.com/apache/spark/commit/be1a1993f1e292793946e73b1e9b3f6d66f73e63).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19215: [MINOR][SQL] Only populate type metadata for required ty...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19215
  
**[Test build #81708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81708/testReport)**
 for PR 19215 at commit 
[`d29c6ff`](https://github.com/apache/spark/commit/d29c6ff158ad2da52c95d1073726f6f5763b3187).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19215: [MINOR][SQL] Only populate type metadata for requ...

2017-09-12 Thread dilipbiswal

GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/19215

[MINOR][SQL] Only populate type metadata for required types such as 
CHAR/VARCHAR.

## What changes were proposed in this pull request?
When reading column descriptions from hive catalog, we currently populate 
the metadata for all types to record the raw hive type string. In terms of 
processing , we need this additional metadata information for CHAR/VARCHAR 
types or complex type containing the CHAR/VARCHAR types.

Its a minor cleanup. I haven't created a JIRA for it.

## How was this patch tested?
Test added in HiveMetastoreCatalogSuite

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark column_metadata

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19215.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19215


commit d29c6ff158ad2da52c95d1073726f6f5763b3187
Author: Dilip Biswal 
Date:   2017-08-05T00:08:09Z

Only populate type metadata for required types such as CHAR/VARCHAR.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19188
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19188
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81701/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19188
  
**[Test build #81701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81701/testReport)**
 for PR 19188 at commit 
[`12767bc`](https://github.com/apache/spark/commit/12767bcbf7c763451b23f3726caa972d699325d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19202
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19202
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81700/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19202
  
**[Test build #81700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81700/testReport)**
 for PR 19202 at commit 
[`09efc4d`](https://github.com/apache/spark/commit/09efc4d9e412729da05fa44dc3de0ceb5b05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81707/testReport)**
 for PR 19122 at commit 
[`7122884`](https://github.com/apache/spark/commit/712288441482348c7f58427d42d4948b89f1df3a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19132: [SPARK-21922] Fix duration always updating when t...

2017-09-12 Thread caneGuy

Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/19132#discussion_r138523362
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -142,7 +142,7 @@ private[v1] object AllStagesResource {
   index = uiData.taskInfo.index,
   attempt = uiData.taskInfo.attemptNumber,
   launchTime = new Date(uiData.taskInfo.launchTime),
-  duration = uiData.taskDuration,
+  duration = uiData.taskDuration(),
--- End diff --

You are right, @jerryshao .IIUC, the `ui` in `AllStagesResource.scala` is 
passed from `ApiRootResource` which also create `sparkUI` by 
`FSHistoryProvider`.So we can also get `lastUpdateTime` from this `ui` in 
`AllStagesResource` and pass to the `taskDuration` interface.I think it is 
another problem for REST?Should we fix here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19202
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19202
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81699/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19202
  
**[Test build #81699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81699/testReport)**
 for PR 19202 at commit 
[`e24fdb8`](https://github.com/apache/spark/commit/e24fdb8fe525263529f457d5e723bb396057ea0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] improve comments

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19213
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81698/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] improve comments

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19213
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] improve comments

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19213
  
**[Test build #81698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81698/testReport)**
 for PR 19213 at commit 
[`0afc9f7`](https://github.com/apache/spark/commit/0afc9f704100b7dda94de7eb50569248a9444b55).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19214
  
**[Test build #81706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81706/testReport)**
 for PR 19214 at commit 
[`afd6dc2`](https://github.com/apache/spark/commit/afd6dc2eff32aaadde2d4d1147994b9d8f2b7285).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19214
  
cc @srowen Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19214: [SPARK-21027][MINOR][FOLLOW-UP] add missing since...

2017-09-12 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request:

https://github.com/apache/spark/pull/19214

[SPARK-21027][MINOR][FOLLOW-UP] add missing since tag

## What changes were proposed in this pull request?

add missing since tag for `setParallelism` in #19110 

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/WeichenXu123/spark minor01

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19214.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19214


commit afd6dc2eff32aaadde2d4d1147994b9d8f2b7285
Author: WeichenXu 
Date:   2017-09-13T03:56:08Z

init pr




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19110: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19110#discussion_r138519719
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -297,6 +298,16 @@ final class OneVsRest @Since("1.4.0") (
   def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
   /**
+   * The implementation of parallel one vs. rest runs the classification 
for
+   * each class in a separate threads.
+   *
+   * @group expertSetParam
+   */
+  def setParallelism(value: Int): this.type = {
--- End diff --

Thanks! I create a PR to fix this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81705/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19122
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81705 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81705/testReport)**
 for PR 19122 at commit 
[`d6cf103`](https://github.com/apache/spark/commit/d6cf103fcc32d8f5634f1ee35b995cbb1a510422).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19122
  
**[Test build #81705 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81705/testReport)**
 for PR 19122 at commit 
[`d6cf103`](https://github.com/apache/spark/commit/d6cf103fcc32d8f5634f1ee35b995cbb1a510422).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluation for...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19122
  
@BryanCutler code updated. thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19122#discussion_r138518283
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -193,7 +194,8 @@ class CrossValidator(Estimator, ValidatorParams, 
MLReadable, MLWritable):
 >>> lr = LogisticRegression()
 >>> grid = ParamGridBuilder().addGrid(lr.maxIter, [0, 1]).build()
 >>> evaluator = BinaryClassificationEvaluator()
->>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
evaluator=evaluator)
+>>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
evaluator=evaluator,
+... parallelism=2)
--- End diff --

test added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19122: [SPARK-21911][ML][PySpark] Parallel Model Evaluat...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19122#discussion_r138518235
  
--- Diff: python/pyspark/ml/tuning.py ---
@@ -208,23 +210,23 @@ class CrossValidator(Estimator, ValidatorParams, 
MLReadable, MLWritable):
 
 @keyword_only
 def __init__(self, estimator=None, estimatorParamMaps=None, 
evaluator=None, numFolds=3,
- seed=None):
+ seed=None, parallelism=1):
 """
 __init__(self, estimator=None, estimatorParamMaps=None, 
evaluator=None, numFolds=3,\
- seed=None)
+ seed=None, parallelism=1)
 """
 super(CrossValidator, self).__init__()
-self._setDefault(numFolds=3)
+self._setDefault(numFolds=3, parallelism=1)
--- End diff --

I add check when creating thread pool.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19110: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-09-12 Thread zhengruifeng

Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/19110#discussion_r138517690
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -297,6 +298,16 @@ final class OneVsRest @Since("1.4.0") (
   def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
   /**
+   * The implementation of parallel one vs. rest runs the classification 
for
+   * each class in a separate threads.
+   *
+   * @group expertSetParam
+   */
+  def setParallelism(value: Int): this.type = {
--- End diff --

missing since annotation


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19186: [SPARK-21972][ML] Add param handlePersistence

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19186
  
**[Test build #81704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81704/testReport)**
 for PR 19186 at commit 
[`e40d3a1`](https://github.com/apache/spark/commit/e40d3a12ef1aeed5d3bd129ed1e610f460c5521f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19188
  
**[Test build #81703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81703/testReport)**
 for PR 19188 at commit 
[`cc11163`](https://github.com/apache/spark/commit/cc111630cd9311dc71ec50dd7915673e95dd520e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19188
  
hmmm. ok, currently, we have two options only, so I feel okay to keep it 
now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19188
  
But looks like current `TPCDSQueryBenchmarkArguments` is not good to test 
individually...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19188
  
Yeah, it's good if you can add one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19188
  
better to add tests for `TPCDSQueryBenchmarkArguments`? we have tests for 
`SparkSubmitArguments` in `SparkSubmitSuite` though.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19188
  
LGTM with few minor comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17387: [SPARK-20060][Deploy][Kerberos]Support Standalone...

2017-09-12 Thread yaooqinn

Github user yaooqinn closed the pull request at:

https://github.com/apache/spark/pull/17387


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138513909
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -110,6 +113,19 @@ object TPCDSQueryBenchmark {
   "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90",
   "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
 
-tpcdsAll(benchmarkArgs.dataLocation, queries = tpcdsQueries)
+// If `--query-filter` defined, filters the queries that this option 
selects
+val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) {
+  val queries = tpcdsQueries.filter { case queryName =>
+benchmarkArgs.queryFilter.contains(queryName)
+  }
+  if (queries.isEmpty) {
+throw new RuntimeException("Bad query name filter: " + 
benchmarkArgs.queryFilter)
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138513929
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -110,6 +113,19 @@ object TPCDSQueryBenchmark {
   "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90",
   "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
 
-tpcdsAll(benchmarkArgs.dataLocation, queries = tpcdsQueries)
+// If `--query-filter` defined, filters the queries that this option 
selects
+val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) {
+  val queries = tpcdsQueries.filter { case queryName =>
+benchmarkArgs.queryFilter.contains(queryName)
--- End diff --

yea, I like the idea.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/19188
  
Also, I manually checked if it worked.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138513890
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -31,7 +32,7 @@ import org.apache.spark.util.Benchmark
  * To run this:
  *  spark-submit --class   
--- End diff --

ok


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-09-12 Thread sitalkedia

Github user sitalkedia commented on the issue:

https://github.com/apache/spark/pull/18317
  
Thanks for the change. Left few comments there.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18887
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81697/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18887
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18887: [SPARK-20642][core] Store FsHistoryProvider listing data...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18887
  
**[Test build #81697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81697/testReport)**
 for PR 18887 at commit 
[`9020184`](https://github.com/apache/spark/commit/9020184bba90fc1c7394ae8ab91877efe0699914).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138512316
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -31,7 +32,7 @@ import org.apache.spark.util.Benchmark
  * To run this:
  *  spark-submit --class   
--- End diff --

Update this usage text too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138512027
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -110,6 +113,19 @@ object TPCDSQueryBenchmark {
   "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90",
   "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
 
-tpcdsAll(benchmarkArgs.dataLocation, queries = tpcdsQueries)
+// If `--query-filter` defined, filters the queries that this option 
selects
+val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) {
+  val queries = tpcdsQueries.filter { case queryName =>
+benchmarkArgs.queryFilter.contains(queryName)
+  }
+  if (queries.isEmpty) {
+throw new RuntimeException("Bad query name filter: " + 
benchmarkArgs.queryFilter)
--- End diff --

`"Empty queries to run. Bad query name filter: " + 
benchmarkArgs.queryFilter`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138511780
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -110,6 +113,19 @@ object TPCDSQueryBenchmark {
   "q81", "q82", "q83", "q84", "q85", "q86", "q87", "q88", "q89", "q90",
   "q91", "q92", "q93", "q94", "q95", "q96", "q97", "q98", "q99")
 
-tpcdsAll(benchmarkArgs.dataLocation, queries = tpcdsQueries)
+// If `--query-filter` defined, filters the queries that this option 
selects
+val queriesToRun = if (benchmarkArgs.queryFilter.nonEmpty) {
+  val queries = tpcdsQueries.filter { case queryName =>
+benchmarkArgs.queryFilter.contains(queryName)
--- End diff --

Add case insensitive?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19188
  
**[Test build #81702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81702/testReport)**
 for PR 19188 at commit 
[`be1a199`](https://github.com/apache/spark/commit/be1a1993f1e292793946e73b1e9b3f6d66f73e63).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19188: [SPARK-21973][SQL] Add an new option to filter qu...

2017-09-12 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/19188#discussion_r138511511
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/TPCDSQueryBenchmark.scala
 ---
@@ -90,7 +91,9 @@ object TPCDSQueryBenchmark {
   benchmark.addCase(name) { i =>
 spark.sql(queryString).collect()
   }
+  logInfo(s"\n\n= TPCDS QUERY BENCHMARK OUTPUT FOR $name =\n")
--- End diff --

See https://github.com/apache/spark/pull/19188#discussion_r137999669


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138510770
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,74 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
--- End diff --

A followup PR to improve the comments is sent: 
https://github.com/apache/spark/pull/19213


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19202
  
**[Test build #81700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81700/testReport)**
 for PR 19202 at commit 
[`09efc4d`](https://github.com/apache/spark/commit/09efc4d9e412729da05fa44dc3de0ceb5b05).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19188: [SPARK-21973][SQL] Add an new option to filter queries i...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19188
  
**[Test build #81701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81701/testReport)**
 for PR 19188 at commit 
[`12767bc`](https://github.com/apache/spark/commit/12767bcbf7c763451b23f3726caa972d699325d4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19132: [SPARK-21922] Fix duration always updating when t...

2017-09-12 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19132#discussion_r138510683
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/api/v1/AllStagesResource.scala ---
@@ -142,7 +142,7 @@ private[v1] object AllStagesResource {
   index = uiData.taskInfo.index,
   attempt = uiData.taskInfo.attemptNumber,
   launchTime = new Date(uiData.taskInfo.launchTime),
-  duration = uiData.taskDuration,
+  duration = uiData.taskDuration(),
--- End diff --

Here what if we call the REST API on history server to get stage info? 
Looks like we may still have this issue since we don't have last update time 
here, what do you think @ajbozarth ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17387: [SPARK-20060][Deploy][Kerberos]Support Standalone visiti...

2017-09-12 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/17387
  
@yaooqinn I think the patch here is quite old and cannot be merged anymore, 
can you please close it.

If you still want to address this issue, can you please create a new PR, 
thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19202: [SPARK-21980][SQL]References in grouping functions shoul...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19202
  
**[Test build #81699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81699/testReport)**
 for PR 19202 at commit 
[`e24fdb8`](https://github.com/apache/spark/commit/e24fdb8fe525263529f457d5e723bb396057ea0a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread ajbozarth

Github user ajbozarth commented on the issue:

https://github.com/apache/spark/pull/19132
  
I've been following, still LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19132: [SPARK-21922] Fix duration always updating when task fai...

2017-09-12 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19132
  
Overall LGTM, @ajbozarth can you please review again?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18659: [SPARK-21190][PYSPARK][WIP] Simple Python Vectorized UDF...

2017-09-12 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18659
  
@BryanCutler I sent a pr to your repository 
https://github.com/BryanCutler/spark/pull/26. Could you take a look at it 
please?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] improve comments

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19213
  
**[Test build #81698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81698/testReport)**
 for PR 19213 at commit 
[`0afc9f7`](https://github.com/apache/spark/commit/0afc9f704100b7dda94de7eb50569248a9444b55).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

2017-09-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/17862
  
@hhbyyh Test result looks good!
OWLQN takes longer time for each iteration, because each iteration's line 
search, it made more passes on dataset.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19213: [SPARK-17642] [SQL] [FOLLOWUP] improve comments

2017-09-12 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/19213
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19213: [SPARK-17642] [SQL] [FOLLOWUP] improve comments

2017-09-12 Thread wzhfy

GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/19213

[SPARK-17642] [SQL] [FOLLOWUP] improve comments

## What changes were proposed in this pull request?

Improve comments for some RunnableCommands for table.

## How was this patch tested?

Only comments. Not related to tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark useless_comment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19213


commit 0afc9f704100b7dda94de7eb50569248a9444b55
Author: Zhenhua Wang 
Date:   2017-09-13T01:25:52Z

improve comments




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-09-12 Thread zsxwing

Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/18317
  
I opened https://github.com/sitalkedia/spark/pull/1 in your repo. Could you 
take a look at it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2017-09-12 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r138505758
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -626,6 +624,74 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * A command to list the info for a column, including name, data type, 
column stats and comment.
+ * This function creates a [[DescribeColumnCommand]] logical plan.
--- End diff --

There are other two similar comments (`ShowPartitionsCommand`, 
`ShowColumnsCommand`) in this file, shall I remove them all?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19141: [SPARK-21384] [YARN] Spark + YARN fails with Loca...

2017-09-12 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19141#discussion_r138505323
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala 
---
@@ -565,7 +565,6 @@ private[spark] class Client(
   distribute(jarsArchive.toURI.getPath,
 resType = LocalResourceType.ARCHIVE,
 destName = Some(LOCALIZED_LIB_DIR))
-  jarsArchive.delete()
--- End diff --

Think about this again, I think you're right. But I'm not sure if the 
program will be crashed or not if we delete the dependencies in the run-time. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread goldmedal

Github user goldmedal commented on the issue:

https://github.com/apache/spark/pull/18875
  
@HyukjinKwon ok. I got it. Thanks =)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18875
  
I think including them in the PR of R and Python would be nicer.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-09-12 Thread sitalkedia

Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r138502867
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import org.apache.spark.util.ThreadUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InterruptedIOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private ReentrantLock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean readInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean readAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
ThreadUtils.newDaemonSingleThreadExecutor("read-ahead");
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private static final ThreadLocal oneByte = 
ThreadLocal.withInitial(() -> new byte[1]);
+
+  /**
+   * Creates a ReadAheadInputStream with the specified buffer 
size and read-ahead
+   * threshold
+   *
+   * @param   inputStream The underlying input stream.
+   * @param   bufferSizeInBytes   The buffer size.
+   * @param   readAheadThresholdInBytes   If the active buffer has less 
data than the read-ahead
+   *  threshold, an async read is 
triggered.
+   */
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
+Preconditions.checkArgument(bufferSizeInBytes > 0,
+"bufferSizeInBytes should be greater than 0, but the value is 
" + bufferSizeInBytes);
+Preconditions.checkArgument(readAheadThresholdInBytes > 0 &&
+readAheadThresholdInBytes < bufferSizeInBytes,
+"readAheadThresholdInBytes should be greater than 0 and less 
than bufferSizeInBytes, but the" +
+"value is " + readAheadThresholdInBytes );
+activeBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+this.readAheadThresholdInBytes = readAheadThresholdInBytes;
+this.underlyingInputStream = inputStream;
+activeBuffer.flip();
+readAheadBuffer.flip();
+  }
+
+  private boolean isEndOfStream() {
+return (!activeBuffer.hasRemaining() && 
!readAheadBuffer.hasRemaining() &&

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18875
  
@HyukjinKwon Should we fix the last two comments with a small follow up PR? 
Or in the PR of Python and R?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18875
  
Yeah, I don't expect to have such big PR for @goldmedal as the first work. 
:) Thanks @HyukjinKwon for careful review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18317: [SPARK-21113][CORE] Read ahead input stream to am...

2017-09-12 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/18317#discussion_r138502374
  
--- Diff: core/src/main/java/org/apache/spark/io/ReadAheadInputStream.java 
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.io;
+
+import com.google.common.base.Preconditions;
+import org.apache.spark.util.ThreadUtils;
+
+import javax.annotation.concurrent.GuardedBy;
+import java.io.EOFException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InterruptedIOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.ReentrantLock;
+
+/**
+ * {@link InputStream} implementation which asynchronously reads ahead 
from the underlying input
+ * stream when specified amount of data has been read from the current 
buffer. It does it by maintaining
+ * two buffer - active buffer and read ahead buffer. Active buffer 
contains data which should be returned
+ * when a read() call is issued. The read ahead buffer is used to 
asynchronously read from the underlying
+ * input stream and once the current active buffer is exhausted, we flip 
the two buffers so that we can
+ * start reading from the read ahead buffer without being blocked in disk 
I/O.
+ */
+public class ReadAheadInputStream extends InputStream {
+
+  private ReentrantLock stateChangeLock = new ReentrantLock();
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer activeBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private ByteBuffer readAheadBuffer;
+
+  @GuardedBy("stateChangeLock")
+  private boolean endOfStream;
+
+  @GuardedBy("stateChangeLock")
+  // true if async read is in progress
+  private boolean readInProgress;
+
+  @GuardedBy("stateChangeLock")
+  // true if read is aborted due to an exception in reading from 
underlying input stream.
+  private boolean readAborted;
+
+  @GuardedBy("stateChangeLock")
+  private Exception readException;
+
+  // If the remaining data size in the current buffer is below this 
threshold,
+  // we issue an async read from the underlying input stream.
+  private final int readAheadThresholdInBytes;
+
+  private final InputStream underlyingInputStream;
+
+  private final ExecutorService executorService = 
ThreadUtils.newDaemonSingleThreadExecutor("read-ahead");
+
+  private final Condition asyncReadComplete = 
stateChangeLock.newCondition();
+
+  private static final ThreadLocal oneByte = 
ThreadLocal.withInitial(() -> new byte[1]);
+
+  /**
+   * Creates a ReadAheadInputStream with the specified buffer 
size and read-ahead
+   * threshold
+   *
+   * @param   inputStream The underlying input stream.
+   * @param   bufferSizeInBytes   The buffer size.
+   * @param   readAheadThresholdInBytes   If the active buffer has less 
data than the read-ahead
+   *  threshold, an async read is 
triggered.
+   */
+  public ReadAheadInputStream(InputStream inputStream, int 
bufferSizeInBytes, int readAheadThresholdInBytes) {
+Preconditions.checkArgument(bufferSizeInBytes > 0,
+"bufferSizeInBytes should be greater than 0, but the value is 
" + bufferSizeInBytes);
+Preconditions.checkArgument(readAheadThresholdInBytes > 0 &&
+readAheadThresholdInBytes < bufferSizeInBytes,
+"readAheadThresholdInBytes should be greater than 0 and less 
than bufferSizeInBytes, but the" +
+"value is " + readAheadThresholdInBytes );
+activeBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+readAheadBuffer = ByteBuffer.allocate(bufferSizeInBytes);
+this.readAheadThresholdInBytes = readAheadThresholdInBytes;
+this.underlyingInputStream = inputStream;
+activeBuffer.flip();
+readAheadBuffer.flip();
+  }
+
+  private boolean isEndOfStream() {
+return (!activeBuffer.hasRemaining() && 
!readAheadBuffer.hasRemaining() && endOfStream);

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread goldmedal

Github user goldmedal commented on the issue:

https://github.com/apache/spark/pull/18875
  
@HyukjinKwon OK, I'll work on R and Python. My JIRA id is 'goldmedal', too. 
 Thanks for your review :)
@viirya Thanks for your mentor and review :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18875
  
BTW, would you mind if I ask your JIRA id @goldmedal? I would like to 
assign it to you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18875: [SPARK-21513][SQL] Allow UDF to_json support converting ...

2017-09-12 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18875
  
Merged to master.

This was a big work for the very first contribution. Thanks for working on 
this and bearing with me @goldmedal and @viirya.

Would you like to work on R and Python too? I think the works should be 
quite small.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-12 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18875


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81694/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18317
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18317: [SPARK-21113][CORE] Read ahead input stream to amortize ...

2017-09-12 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18317
  
**[Test build #81694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81694/testReport)**
 for PR 18317 at commit 
[`f30117e`](https://github.com/apache/spark/commit/f30117eaf5b8274cc19832c6a36acaf44adc7915).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18875: [SPARK-21513][SQL] Allow UDF to_json support conv...

2017-09-12 Thread goldmedal

Github user goldmedal commented on a diff in the pull request:

https://github.com/apache/spark/pull/18875#discussion_r138501335
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonGenerator.scala
 ---
@@ -26,20 +26,50 @@ import 
org.apache.spark.sql.catalyst.expressions.SpecializedGetters
 import org.apache.spark.sql.catalyst.util.{ArrayData, DateTimeUtils, 
MapData}
 import org.apache.spark.sql.types._
 
+/**
+ * `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+ * Once it is initialized with `StructType`, it can be used to write out a 
struct or an array of
+ * struct. Once it is initialized with `MapType`, it can be used to write 
out a map or an array
+ * of map. An exception will be thrown if trying to write out a struct if 
it is initialized with
+ * a `MapType`, and vice verse.
+ */
 private[sql] class JacksonGenerator(
-schema: StructType,
+dataType: DataType,
 writer: Writer,
 options: JSONOptions) {
   // A `ValueWriter` is responsible for writing a field of an 
`InternalRow` to appropriate
   // JSON data. Here we are using `SpecializedGetters` rather than 
`InternalRow` so that
   // we can directly access data in `ArrayData` without the help of 
`SpecificMutableRow`.
   private type ValueWriter = (SpecializedGetters, Int) => Unit
 
+  // `JackGenerator` can only be initialized with a `StructType` or a 
`MapType`.
+  require(dataType.isInstanceOf[StructType] | 
dataType.isInstanceOf[MapType],
--- End diff --

oh.  Yes, you're right. This is my mistake. :(


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19211: [SPARK-18838][core] Add separate listener queues to Live...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19211
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81693/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19180: [SPARK-21967][CORE] org.apache.spark.unsafe.types.UTF8St...

2017-09-12 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19180
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 465 matches

Mail list logo