date:20170727

[GitHub] spark pull request #18756: [SPARK-21548][SQL] "Support insert into serial co...

2017-07-27 Thread lvdongr

GitHub user lvdongr opened a pull request:

https://github.com/apache/spark/pull/18756

[SPARK-21548][SQL] "Support insert into serial columns of table"

## What changes were proposed in this pull request?
When we use the 'insert into ...' statement we can only insert all the 
columns into table.But int some cases,our table has many columns and we are 
only interest in some of them.So we want to support the statement "insert into 
table tbl (column1, column2,...) values (value1, value2, value3,...)".
https://issues.apache.org/jira/browse/SPARK-21548

## How was this patch tested?
unit tests, integration tests, manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvdongr/spark SPARK-21548

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18756.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18756


commit 01af8ce69afeade8bb034c6965de0f3738f12fd5
Author: lvdongr 
Date:   2017-03-08T04:09:40Z

[SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be 
configured, when you use DirectKafkaInputDStream to connect the kafka in a 
Spark Streaming application has been successfully created.

commit b6daeec664d757999e257e56fed3844db51515e2
Author: lvdongr 
Date:   2017-03-11T06:35:57Z

Merge remote-tracking branch 'apache/master'

commit e0e47b1da93b90210e44abc6e90655d3028555ec
Author: lvdongr 
Date:   2017-04-12T07:20:01Z

Merge remote-tracking branch 'apache/master'

commit f4ab88111c5b8e9700eacc1acfa3858aed45124e
Author: lvdongr 
Date:   2017-07-27T01:54:56Z

isklakldsng branch 'apache/master'

commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb
Author: lvdongr 
Date:   2017-07-27T12:09:47Z

Merge remote-tracking branch 'apache/master'

commit 2a40d64bcad6613892a54bc3052a634f59c14c65
Author: lvdongr 
Date:   2017-07-28T06:56:15Z

[SPARK-21548][SQL]Support insert into serial columns of table




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18525: [SPARK-21297] [WEB-UI]Add count in 'JDBC/ODBC Server' pa...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18525
  
**[Test build #3855 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3855/testReport)**
 for PR 18525 at commit 
[`4001028`](https://github.com/apache/spark/commit/4001028926f08dd8e2286e6c8cb2cd81315b6a93).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18738: Typo in comment

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18738
  
**[Test build #3856 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3856/testReport)**
 for PR 18738 at commit 
[`dd2eb6b`](https://github.com/apache/spark/commit/dd2eb6bec99b80b085b11f4ee12c4d3feb66461e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18323: [SPARK-21117][SQL] Built-in SQL Function Support ...

2017-07-27 Thread wangyum

Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18323#discussion_r130021930
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathExpressions.scala
 ---
@@ -1219,44 +1219,91 @@ case class WidthBucket(
   override def dataType: DataType = LongType
   override def nullable: Boolean = true
 
+  private val isFoldable = minValue.foldable && maxValue.foldable && 
numBucket.foldable
+
+  private lazy val _minValue: Any = minValue.eval(EmptyRow)
+  private lazy val minValueV = _minValue.asInstanceOf[Double]
--- End diff --

if `minValue.eval(EmptyRow) == null`, 
`minValue.eval(EmptyRow).asInstanceOf[Double]` will be `0.0`,
So keep both of them here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18525: [SPARK-21297] [WEB-UI]Add count in 'JDBC/ODBC Server' pa...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18525
  
**[Test build #3855 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3855/testReport)**
 for PR 18525 at commit 
[`4001028`](https://github.com/apache/spark/commit/4001028926f08dd8e2286e6c8cb2cd81315b6a93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18750: Skip maven-compiler-plugin main and test compilations in...

2017-07-27 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18750
  
Likewise @vanzin any thoughts on this one? because it touches the 
compilation and build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18745: [SPARK-21544][DEPLOY] Tests jar of some module should no...

2017-07-27 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18745
  
@vanzin do you have any thoughts on this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...

2017-07-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18725#discussion_r130021324
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(_, child) if sameOutput(child.output, p.output) => 
child
 
+// The column of father project contains not deterministic function
+// e.g Rand function. father project will be split to two project.
+case h @ Project(fields, _: LeafNode) if 
!fields.forall(_.deterministic) =>
--- End diff --

I meant, in all the cases projectList contains non-deterministic expr, do 
we always need to split the project on all kinds of LeafNode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18323: [SPARK-21117][SQL] Built-in SQL Function Support - WIDTH...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18323
  
**[Test build #80015 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80015/testReport)**
 for PR 18323 at commit 
[`0940a49`](https://github.com/apache/spark/commit/0940a49ebf731221a5bcebcd0021e3ed35d9a6ad).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...

2017-07-27 Thread heary-cao

Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18725#discussion_r130020290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(_, child) if sameOutput(child.output, p.output) => 
child
 
+// The column of father project contains not deterministic function
+// e.g Rand function. father project will be split to two project.
+case h @ Project(fields, _: LeafNode) if 
!fields.forall(_.deterministic) =>
--- End diff --

There is no need to split for all LeafNode. if and only if the projectList 
is non-deterministic for LeafNode's father.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18755: [SPARK-21553][Spark Shell] Added the description of the ...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18755
  
**[Test build #3854 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3854/testReport)**
 for PR 18755 at commit 
[`d764f5e`](https://github.com/apache/spark/commit/d764f5e8c589cff87668bb95bf3e6e046668fa54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-07-27 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r130018225
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -792,6 +793,104 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 collectAndValidate(df, json, "binaryData.json")
   }
 
+  test("date type conversion") {
+val json =
+  s"""
+ |{
+ |  "schema" : {
+ |"fields" : [ {
+ |  "name" : "date",
+ |  "type" : {
+ |"name" : "date",
+ |"unit" : "DAY"
+ |  },
+ |  "nullable" : true,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 32
+ |} ]
+ |  }
+ |} ]
+ |  },
+ |  "batches" : [ {
+ |"count" : 4,
+ |"columns" : [ {
+ |  "name" : "date",
+ |  "count" : 4,
+ |  "VALIDITY" : [ 1, 1, 1, 1 ],
+ |  "DATA" : [ -1, 0, 16533, 382607 ]
+ |} ]
+ |  } ]
+ |}
+   """.stripMargin
+
+val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US)
+val d1 = DateTimeUtils.toJavaDate(-1)  // "1969-12-31"
+val d2 = DateTimeUtils.toJavaDate(0)  // "1970-01-01"
+val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime)
+val d4 = new Date(sdf.parse("3017-07-18 14:55:00.000 UTC").getTime)
--- End diff --

`d3` and `d4` might be flaky in some timezone.
Should we use `Date.valueOf()`?:

```scala
val d3 = Date.valueOf("2015-04-08")
val d4 = Date.valueOf("3017-07-18")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18751: [SPARK-21548][SQL]Support insert into serial colu...

2017-07-27 Thread lvdongr

Github user lvdongr closed the pull request at:

https://github.com/apache/spark/pull/18751


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-27 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18610#discussion_r130015435
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter {
 val metadataJson: String = compact(render(metadata))
 metadataJson
   }
+
+  /**
+   * Save estimator's `initialModel` to corresponding path.
+   */
+  def saveInitialModel[T <: HasInitialModel[_ <: MLWritable with Params]](
+  instance: T, path: String): Unit = {
+if (instance.isDefined(instance.initialModel)) {
+  val initialModelPath = new Path(path, "initialModel").toString
+  val initialModel = instance.getOrDefault(instance.initialModel)
+  // When saving, only keep the direct initialModel by eliminating 
possible initialModels of the
+  // direct initialModel, to avoid unnecessary deep recursion of 
initialModel.
+  if (initialModel.hasParam("initialModel")) {
+initialModel.clear(initialModel.getParam("initialModel"))
+  }
+  initialModel.save(initialModelPath)
--- End diff --

Fair enough.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18754
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18754
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80014/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18754
  
**[Test build #80014 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80014/testReport)**
 for PR 18754 at commit 
[`9e60762`](https://github.com/apache/spark/commit/9e60762d830c320967742d80cb17c55631f6b11a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18540
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80012/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18540
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18540
  
**[Test build #80012 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80012/testReport)**
 for PR 18540 at commit 
[`9abdb5e`](https://github.com/apache/spark/commit/9abdb5eee7aab766fe73ca00749efa2a16328882).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-07-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
Will address comments within few days. (I am reading docs just to get used 
to things around my updated status)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18695: [SPARK-12717][PYTHON] Adding thread-safe broadcast pickl...

2017-07-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18695
  
The change LGTM. Will it be hard to add a reliable test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-27 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18655
  
@BryanCutler @wesm @cpcloud I filed a JIRA issue for decimal type support 
[SPARK-21552](https://issues.apache.org/jira/browse/SPARK-21552) and sent a pr 
for it as WIP #18754.
Let's move on there for discussing decimal type support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18755: [SPARK-21553][Spark Shell] Added the description of the ...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18755
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18754: [WIP][SPARK-21552][SQL] Add DecimalType support to Arrow...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18754
  
**[Test build #80014 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80014/testReport)**
 for PR 18754 at commit 
[`9e60762`](https://github.com/apache/spark/commit/9e60762d830c320967742d80cb17c55631f6b11a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18755: [SPARK-21553][Spark Shell] Added the description ...

2017-07-27 Thread davidxdh

GitHub user davidxdh opened a pull request:

https://github.com/apache/spark/pull/18755

[SPARK-21553][Spark Shell] Added the description of the default value of 
master parameter in the spark-shell

When I type spark-shell --help, I find that the default value description 
for the master parameter is missing. The user does not know what the default 
value is when the master parameter is not included, so we need to add the 
master parameter default description to the help information.

[https://issues.apache.org/jira/browse/SPARK-21553](https://issues.apache.org/jira/browse/SPARK-21553)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davidxdh/spark dev_0728

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18755.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18755


commit 95602dc2e0ccde3f2d94789307048474f2d0ae7f
Author: Donghui Xu 
Date:   2017-07-28T01:31:48Z

Merge pull request #1 from apache/master

Merge from apache/spark

commit d764f5e8c589cff87668bb95bf3e6e046668fa54
Author: davidxdh 
Date:   2017-07-28T03:46:00Z

[SPARK-21553][Spark Shell] Added the description of the default value of 
master parameter in the spark-shell




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18754: [WIP][SPARK-21552][SQL] Add DecimalType support t...

2017-07-27 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/18754

[WIP][SPARK-21552][SQL] Add DecimalType support to ArrowWriter.

## What changes were proposed in this pull request?

Decimal type is not yet supported in `ArrowWriter`.
This is adding the decimal type support.

## How was this patch tested?

Added a test to `ArrowConvertersSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark issues/SPARK-21552

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18754.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18754


commit 9e60762d830c320967742d80cb17c55631f6b11a
Author: Takuya UESHIN 
Date:   2017-07-26T04:34:31Z

Add DecimalType support to ArrowWriter.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-07-27 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17435
  
@szalai1 Could you fix tests if you're still working on this please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17435
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17435
  
**[Test build #80013 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80013/testReport)**
 for PR 17435 at commit 
[`8872e19`](https://github.com/apache/spark/commit/8872e190b16b328205e0df569d5f5bc3af6c5610).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17435
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80013/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18185
  
@gatorsmile ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80011/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18185
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18185
  
**[Test build #80011 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80011/testReport)**
 for PR 18185 at commit 
[`23ca897`](https://github.com/apache/spark/commit/23ca897825a51baa1b879c3b7968749199e8724f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedSubqueryColumnAliases(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17435
  
**[Test build #80013 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80013/testReport)**
 for PR 17435 at commit 
[`8872e19`](https://github.com/apache/spark/commit/8872e190b16b328205e0df569d5f5bc3af6c5610).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17435: [SPARK-20098][PYSPARK] dataType's typeName fix

2017-07-27 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/17435
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18753: [SPARK-21548] [SQL] Support insert into serial columns o...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18753
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18753: [SPARK-21548] [SQL] Support insert into serial co...

2017-07-27 Thread lvdongr

GitHub user lvdongr opened a pull request:

https://github.com/apache/spark/pull/18753

[SPARK-21548] [SQL] Support insert into serial columns of table

## What changes were proposed in this pull request?

When we use the 'insert into ...' statement we can only insert all the 
columns into table.But int some cases,our table has many columns and we are 
only interest in some of them.So we want to support the statement "insert into 
table tbl (column1, column2,...) values (value1, value2, value3,...)".

## How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvdongr/spark SPARK--21548

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18753.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18753


commit 01af8ce69afeade8bb034c6965de0f3738f12fd5
Author: lvdongr 
Date:   2017-03-08T04:09:40Z

[SPARK-19863][DStream] Whether or not use CachedKafkaConsumer need to be 
configured, when you use DirectKafkaInputDStream to connect the kafka in a 
Spark Streaming application has been successfully created.

commit b6daeec664d757999e257e56fed3844db51515e2
Author: lvdongr 
Date:   2017-03-11T06:35:57Z

Merge remote-tracking branch 'apache/master'

commit e0e47b1da93b90210e44abc6e90655d3028555ec
Author: lvdongr 
Date:   2017-04-12T07:20:01Z

Merge remote-tracking branch 'apache/master'

commit f4ab88111c5b8e9700eacc1acfa3858aed45124e
Author: lvdongr 
Date:   2017-07-27T01:54:56Z

isklakldsng branch 'apache/master'

commit 463e570f9e05f785834e27bd535cfbb3b7cb7dfb
Author: lvdongr 
Date:   2017-07-27T12:09:47Z

Merge remote-tracking branch 'apache/master'

commit da882ea569d451b3f2af550b0976a6a059900f6a
Author: lvdongr 
Date:   2017-07-28T02:56:23Z

[SPARK-21548][SQL]Support insert into serial columns of table

commit a65be1605865a1159532ba148434d3bb207da64c
Author: lvdongr 
Date:   2017-07-28T03:03:23Z

refresh last commit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

2017-07-27 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18468
  
@cloud-fan how will we go forward? @rxin seems to have no comment for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18725: [SPARK-21520][SQL]Hivetable scan for all the columns the...

2017-07-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18725
  
Can the current fix work for the case like the following?

Project [a]
  Filter [rand() > 1]
TableScan [a, b, c]

`PhysicalOperation` still fails for non-deterministic `Filter`. So you 
still read all columns from the table.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18525: [SPARK-21297] [WEB-UI]Add count in 'JDBC/ODBC Server' pa...

2017-07-27 Thread guoxiaolongzte

Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18525
  
@ajbozarth Okay.Thanks.
@srowen  Help review the code.Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...

2017-07-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18725#discussion_r13082
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(_, child) if sameOutput(child.output, p.output) => 
child
 
+// The column of father project contains not deterministic function
+// e.g Rand function. father project will be split to two project.
+case h @ Project(fields, _: LeafNode) if 
!fields.forall(_.deterministic) =>
--- End diff --

The question is, do we always need to split the project for all LeafNode?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18554: [SPARK-21306][ML] OneVsRest should support setWei...

2017-07-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18554


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18554: [SPARK-21306][ML] OneVsRest should support setWeightCol

2017-07-27 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18554
  
Merged into master, thanks for all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18540
  
**[Test build #80012 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80012/testReport)**
 for PR 18540 at commit 
[`9abdb5e`](https://github.com/apache/spark/commit/9abdb5eee7aab766fe73ca00749efa2a16328882).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17203: [SPARK-19863][DStream] Whether or not use CachedK...

2017-07-27 Thread lvdongr

Github user lvdongr closed the pull request at:

https://github.com/apache/spark/pull/17203


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-07-27 Thread yaooqinn

Github user yaooqinn commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r129998159
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveUtilsSuite.scala ---
@@ -33,4 +33,13 @@ class HiveUtilsSuite extends QueryTest with SQLTestUtils 
with TestHiveSingleton
   assert(conf(ConfVars.METASTORE_END_FUNCTION_LISTENERS.varname) === 
"")
 }
   }
+
+  test("newTemporaryConfiguration respect spark.hadoop.foo=bar in 
SparkConf") {
+sys.props.put("spark.hadoop.foo", "bar")
--- End diff --

@cloud-fan at the very beginning, the spark-sumit do the same thing that 
add properties from --conf and spark-default.conf to sys.props. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18610: [SPARK-21386] ML LinearRegression supports warm s...

2017-07-27 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18610#discussion_r129997548
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala ---
@@ -309,6 +313,23 @@ private[ml] object DefaultParamsWriter {
 val metadataJson: String = compact(render(metadata))
 metadataJson
   }
+
+  /**
+   * Save estimator's `initialModel` to corresponding path.
+   */
+  def saveInitialModel[T <: HasInitialModel[_ <: MLWritable with Params]](
+  instance: T, path: String): Unit = {
+if (instance.isDefined(instance.initialModel)) {
+  val initialModelPath = new Path(path, "initialModel").toString
+  val initialModel = instance.getOrDefault(instance.initialModel)
+  // When saving, only keep the direct initialModel by eliminating 
possible initialModels of the
+  // direct initialModel, to avoid unnecessary deep recursion of 
initialModel.
+  if (initialModel.hasParam("initialModel")) {
+initialModel.clear(initialModel.getParam("initialModel"))
+  }
+  initialModel.save(initialModelPath)
--- End diff --

Actually we did in the later way, ```initialModel``` is only param for 
Estimator, not for Model. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...

2017-07-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18725#discussion_r129997298
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(_, child) if sameOutput(child.output, p.output) => 
child
 
+// The column of father project contains not deterministic function
+// e.g Rand function. father project will be split to two project.
+case h @ Project(fields, _: LeafNode) if 
!fields.forall(_.deterministic) =>
--- End diff --

It is a bit difficult to infer why this rule exists without the context of 
this pr. Please add a comment for it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...

2017-07-27 Thread heary-cao

Github user heary-cao commented on a diff in the pull request:

https://github.com/apache/spark/pull/18725#discussion_r129997034
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(_, child) if sameOutput(child.output, p.output) => 
child
 
+// The column of father project contains not deterministic function
+// e.g Rand function. father project will be split to two project.
+case h @ Project(fields, _: LeafNode) if 
!fields.forall(_.deterministic) =>
--- End diff --

Other non LeafNode case will be handled by ColumnPruning and 
CollapseProject.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16648: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-07-27 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/16648
  
ping @bdrillard for the 2nd part of this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18185
  
**[Test build #80011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80011/testReport)**
 for PR 18185 at commit 
[`23ca897`](https://github.com/apache/spark/commit/23ca897825a51baa1b879c3b7968749199e8724f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-27 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r129989469
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/param/shared/HasParallelism.scala ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.param.shared
+
+import scala.concurrent.ExecutionContext
+
+import org.apache.spark.ml.param.{IntParam, Params, ParamValidators}
+import org.apache.spark.util.ThreadUtils
+
+/**
+ * Common parameter for estimators trained in a multithreaded environment.
+ */
+private[ml] trait HasParallelism extends Params {
+
+  /**
+   * param for the number of threads to use when running parallel 
meta-algorithms
+   * @group expertParam
+   */
+  val parallelism = new IntParam(this, "parallelism",
+"the number of threads to use when running parallel algorithms", 
ParamValidators.gtEq(1))
+
+  setDefault(parallelism -> 1)
+
+  /** @group expertGetParam */
+  def getParallelism: Int = $(parallelism)
+
+  /** @group expertSetParam */
+  def setParallelism(value: Int): this.type = {
--- End diff --

You can remove this now that it is in OneVsRest


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-27 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r129989379
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -294,6 +296,18 @@ final class OneVsRest @Since("1.4.0") (
   @Since("1.5.0")
   def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
+  /** @group expertGetParam */
+  override def getParallelism: Int = $(parallelism)
+
+  /**
+   * @group expertSetParam
+   * The implementation of parallel one vs. rest runs the classification 
for
--- End diff --

Also, please put the group annotation at the bottom to match existing code 
style.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-27 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r129989332
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -294,6 +296,18 @@ final class OneVsRest @Since("1.4.0") (
   @Since("1.5.0")
   def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
+  /** @group expertGetParam */
+  override def getParallelism: Int = $(parallelism)
+
+  /**
+   * @group expertSetParam
+   * The implementation of parallel one vs. rest runs the classification 
for
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18745: [SPARK-21544][DEPLOY] Tests jar of some module should no...

2017-07-27 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/18745
  
cc @srowen Test done,any more problem?Thanks too much.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-07-27 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/18281
  
@holdenk Some of those improvements on handling parallelism sounds useful, 
but I'd prefer we merge this and then add more improvements.  This PR should be 
a strict improvement there (moving from no parallelism to some potential for 
parallelism).

Do people have more comments before this is merged?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18185
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18185
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80010/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18185
  
**[Test build #80010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80010/testReport)**
 for PR 18185 at commit 
[`2b50e50`](https://github.com/apache/spark/commit/2b50e5088d4eca5d38837e421f7e9960a2e2128d).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedSubqueryColumnAliases(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18185
  
**[Test build #80010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80010/testReport)**
 for PR 18185 at commit 
[`2b50e50`](https://github.com/apache/spark/commit/2b50e5088d4eca5d38837e421f7e9960a2e2128d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18185
  
Thanks! Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

2017-07-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18740


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18740
  
Thanks! Merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18725: [SPARK-21520][SQL]Hivetable scan for all the colu...

2017-07-27 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18725#discussion_r129984515
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -495,6 +495,23 @@ object ColumnPruning extends Rule[LogicalPlan] {
 // Eliminate no-op Projects
 case p @ Project(_, child) if sameOutput(child.output, p.output) => 
child
 
+// The column of father project contains not deterministic function
+// e.g Rand function. father project will be split to two project.
+case h @ Project(fields, _: LeafNode) if 
!fields.forall(_.deterministic) =>
--- End diff --

Then once your project is not on top of a LeafNode, this rule doesn't work? 
Your fix is just for the specified case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-07-27 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r129982672
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -101,6 +101,45 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
   }
 
+  test("one-vs-rest: tuning parallelism does not change output") {
--- End diff --

Is there a good way to do that?  I'm having trouble thinking of ways to do 
it which would not produce flaky tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18652: [SPARK-21497][SQL] Pull non-deterministic equi join keys...

2017-07-27 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18652
  
@gatorsmile  Actually it is not rare we add a feature step by step in 
SparkSQL. This is not a reason preventing us from adding this support. I think 
this change already help much this kind of workload.

As said in previous discussion, we can't avoid few issues regarding the 
non-deterministic non equi join condition. We can simply allow it, but it faces 
inconsistency due to different join implementations. We can pull out it to 
downstream project, but it possibly changes the number of calls. 
`EnsureRequirements` can change the call order.

Notice that those issues are for non equi join condition, equi join 
condition is free from the issues.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17180: [SPARK-19839][Core]release longArray in BytesToBytesMap

2017-07-27 Thread zhzhan

Github user zhzhan commented on the issue:

https://github.com/apache/spark/pull/17180
  
retest it please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-07-27 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/18752
  
cc @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18752: [SPARK-21551][Python] Increase timeout for PythonRDD.ser...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18752
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18752: [SPARK-21551][Python] Increase timeout for Python...

2017-07-27 Thread peay

GitHub user peay opened a pull request:

https://github.com/apache/spark/pull/18752

[SPARK-21551][Python] Increase timeout for PythonRDD.serveIterator

## What changes were proposed in this pull request?

This modification increases the timeout for `serveIterator` (which is not 
dynamically configurable). This fixes timeout issues in pyspark when using 
`collect` and similar functions, in cases where Python may take more than a 
couple seconds to connect.

## How was this patch tested?

Ran the tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/peay/spark spark-21551

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18752.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18752


commit 9d3c6640f56e3e4fd195d3ad8cead09df67a72c7
Author: peay 
Date:   2017-07-27T20:49:28Z

[SPARK-21551][Python] Increase timeout for PythonRDD.serveIterator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80009/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18740
  
**[Test build #80009 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80009/testReport)**
 for PR 18740 at commit 
[`0b0eea9`](https://github.com/apache/spark/commit/0b0eea941cb850967e943719822cfab89479a025).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80008/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18664
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18664
  
**[Test build #80008 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80008/testReport)**
 for PR 18664 at commit 
[`3b83d7a`](https://github.com/apache/spark/commit/3b83d7acf17433b1f5581f0b8b87c54a91309839).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ArrowWriter(val root: VectorSchemaRoot, fields: 
Array[ArrowFieldWriter]) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18555: [SPARK-21353][CORE]add checkValue in spark.intern...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18555#discussion_r129946652
  
--- Diff: core/src/test/scala/org/apache/spark/SparkConfSuite.scala ---
@@ -322,6 +324,291 @@ class SparkConfSuite extends SparkFunSuite with 
LocalSparkContext with ResetSyst
 conf.validateSettings()
   }
 
+  test("verify spark.blockManager.port configuration") {
+val conf = new SparkConf(false)
+  .setMaster("local").setAppName("My app")
+
+conf.validateSettings()
+assert(!conf.contains(BLOCK_MANAGER_PORT.key))
+
+Seq(
+  "0", // normal values
+  "1024", // min values
+  "65535" // max values
+).foreach { value =>
+  conf.set(BLOCK_MANAGER_PORT.key, value)
+  var sc0 = new SparkContext(conf)
+  assert(sc0.isStopped === false)
+  assert(sc0.conf.get(BLOCK_MANAGER_PORT) === value.toInt)
+  sc0.stop()
+  conf.remove(BLOCK_MANAGER_PORT)
+}
+
+// Verify abnormal values
+Seq(
+  "-1",
+  "1000",
+  "65536"
+).foreach { value =>
+  conf.set(BLOCK_MANAGER_PORT.key, value)
+  val excMsg = intercept[IllegalArgumentException] {
+new SparkContext(conf)
+  }.getMessage
+  // Caused by: java.lang.IllegalArgumentException:
+  // blockManager port should be between 1024 and 65535 (inclusive),
+  // or 0 for a random free port.
+  assert(excMsg.contains("blockManager port should be between 1024 " +
+"and 65535 (inclusive), or 0 for a random free port."))
+
+  conf.remove(BLOCK_MANAGER_PORT)
+}
+  }
+
+  test("verify spark.executor.memory configuration exception") {
+val conf = new SparkConf(false)
+  .setMaster("local").setAppName("executor memory")
+  .set(EXECUTOR_MEMORY.key, "-1")
+val excMsg = intercept[NumberFormatException] {
+  sc = new SparkContext(conf)
+}.getMessage
+// Caused by: java.lang.NumberFormatException:
+// Size must be specified as bytes (b), kibibytes (k),
+// mebibytes (m), gibibytes (g), tebibytes (t),
+// or pebibytes(p). E.g. 50b, 100k, or 250m.
+assert(excMsg.contains("Size must be specified as bytes (b), kibibytes 
(k), " +
+  "mebibytes (m), gibibytes (g), tebibytes (t), or pebibytes(p). E.g. 
50b, 100k, or 250m."))
+  }
+
+  test("verify spark.task.cpus configuration exception") {
+val conf = new SparkConf(false)
+  .setMaster("local").setAppName("cpus")
+  .set(CPUS_PER_TASK.key, "-1")
+val excMsg = intercept[IllegalArgumentException] {
+  sc = new SparkContext(conf)
+}.getMessage
+// Caused by: java.lang.IllegalArgumentException:
+// Number of cores to allocate for task event queue must be positive.
+assert(excMsg.contains("Number of cores to allocate for task event 
queue must be positive."))
+  }
+
+  test("verify spark.task.maxFailures configuration exception") {
+val conf = new SparkConf(false)
+  .setMaster("local").setAppName("task maxFailures")
+  .set(MAX_TASK_FAILURES.key, "-1")
+val sc0 = new SparkContext(conf)
+val excMsg = intercept[IllegalArgumentException] {
+  new TaskSchedulerImpl(sc0)
+}.getMessage
+// Caused by: java.lang.IllegalArgumentException:
+// The retry times of task should be greater than or equal to 1.
+assert(excMsg.contains("The retry times of task should be greater than 
or equal to 1."))
+sc0.stop()
+  }
+
+  test("verify listenerbus.eventqueue.capacity configuration exception") {
+val conf = new SparkConf(false)
+  .setMaster("local").setAppName("capacity")
+  .set(LISTENER_BUS_EVENT_QUEUE_CAPACITY.key, "-1")
+val excMsg = intercept[IllegalArgumentException] {
+  sc = new SparkContext(conf)
+}.getMessage
+// Caused by: java.lang.IllegalArgumentException:
+// The capacity of listener bus event queue must be positive.
+assert(excMsg.contains("The capacity of listener bus event queue must 
be positive."))
+  }
+
+  test("verify metrics.maxListenerClassesTimed configuration exception") {
+val conf = new SparkConf(false)
+  .setMaster("local").setAppName("listenerbus")
+  .set(LISTENER_BUS_METRICS_MAX_LISTENER_CLASSES_TIMED.key, "-1")
+val excMsg = intercept[IllegalArgumentException] {
+  sc = new SparkContext(conf)
+}.getMessage
+// Caused by: java.lang.IllegalArgumentException:
+// The maxListenerClassesTimed of listener bus event queue must be 
positive.
+assert(excMsg.contains("The maxListenerClassesTimed of listener bus " +
+  "event queue must be posit

[GitHub] spark issue #18185: [SPARK-20962][SQL] Support subquery column aliases in FR...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18185
  
LGTM except a few minor comments. Thanks for working on it! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18750: Skip maven-compiler-plugin main and test compilations in...

2017-07-27 Thread gslowikowski

Github user gslowikowski commented on the issue:

https://github.com/apache/spark/pull/18750
  
Updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18185#discussion_r129944781
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -859,6 +859,22 @@ class Analyzer(
   // rule: ResolveDeserializer.
   case plan if containsDeserializer(plan.expressions) => plan
 
+  case u @ UnresolvedSubqueryColumnAlias(columnNames, child) if 
child.resolved =>
+// Resolves output attributes if a query has alias names in its 
subquery:
+// e.g., SELECT * FROM (SELECT 1 AS a, 1 AS b) t(col1, col2)
+val outputAttrs = child.output
+// Checks if the number of the aliases equals to the number of 
output columns
+// in the subquery.
+if (columnNames.size != outputAttrs.size) {
+  u.failAnalysis(s"Number of column aliases does not match number 
of columns. " +
--- End diff --

Nit: remove the string Interpolator `s`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18185#discussion_r129944055
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -423,6 +423,26 @@ case class UnresolvedAlias(
 }
 
 /**
+ * Aliased column names for subquery. We could add alias names for output 
columns in the subquery:
--- End diff --

resolved by positions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18185#discussion_r129943797
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -423,6 +423,26 @@ case class UnresolvedAlias(
 }
 
 /**
+ * Aliased column names for subquery. We could add alias names for output 
columns in the subquery:
+ * {{{
+ *   // Assign alias names for output columns
+ *   SELECT col1, col2 FROM testData AS t(col1, col2);
+ * }}}
+ *
+ * @param outputColumnNames the column names for this subquery.
+ * @param child the logical plan of this subquery.
--- End diff --

Nit: `the logical plan of this subquery` -> `the [[LogicalPlan]] on which 
this subquery column aliases apply`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18185#discussion_r129943216
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala
 ---
@@ -423,6 +423,26 @@ case class UnresolvedAlias(
 }
 
 /**
+ * Aliased column names for subquery. We could add alias names for output 
columns in the subquery:
+ * {{{
+ *   // Assign alias names for output columns
+ *   SELECT col1, col2 FROM testData AS t(col1, col2);
+ * }}}
+ *
+ * @param outputColumnNames the column names for this subquery.
+ * @param child the logical plan of this subquery.
+ */
+case class UnresolvedSubqueryColumnAlias(
--- End diff --

Nit: `UnresolvedSubqueryColumnAlias ` -> `UnresolvedSubqueryColumnAliases`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18185: [SPARK-20962][SQL] Support subquery column aliase...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18185#discussion_r129943038
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 ---
@@ -750,20 +750,28 @@ class AstBuilder(conf: SQLConf) extends 
SqlBaseBaseVisitor[AnyRef] with Logging
   /**
* Create an alias (SubqueryAlias) for a sub-query. This is practically 
the same as
* visitAliasedRelation and visitNamedExpression, ANTLR4 however 
requires us to use 3 different
-   * hooks.
+   * hooks. We could add alias names for output columns, for example:
+   * {{{
+   *   SELECT col1, col2 FROM testData AS t(col1, col2)
+   * }}}
*/
   override def visitAliasedQuery(ctx: AliasedQueryContext): LogicalPlan = 
withOrigin(ctx) {
-val alias = if (ctx.strictIdentifier == null) {
+val alias = if (ctx.tableAlias.strictIdentifier == null) {
   // For un-aliased subqueries, use a default alias name that is not 
likely to conflict with
   // normal subquery names, so that parent operators can only access 
the columns in subquery by
   // unqualified names. Users can still use this special qualifier to 
access columns if they
   // know it, but that's not recommended.
   "__auto_generated_subquery_name"
 } else {
-  ctx.strictIdentifier.getText
+  ctx.tableAlias.strictIdentifier.getText
+}
+val subquery = SubqueryAlias(alias, 
plan(ctx.queryNoWith).optionalMap(ctx.sample)(withSample))
+if (ctx.tableAlias.identifierList != null) {
+  val columnNames = visitIdentifierList(ctx.tableAlias.identifierList)
--- End diff --

Nit: `columnNames ` -> `columnAliases`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18740
  
LGTM pending Jenkins 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18740
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18740
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80007/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18740
  
**[Test build #80007 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80007/testReport)**
 for PR 18740 at commit 
[`309cb8f`](https://github.com/apache/spark/commit/309cb8f09f5af81f230911574274c0ca9eb65f34).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes exam...

2017-07-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18749#discussion_r129927520
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionInfo.java
 ---
@@ -26,6 +26,10 @@
 private String name;
 private String extended;
 private String db;
+private String arguments;
--- End diff --

There aren't many `ExpressionInfo` objects in memory right? adding more 
info to this bean doesn't have any meaningful performance implications, I 
presume. I suppose it's just breaking down the existing info further.

I also presume this is considered an internal API so it's OK to change the 
constructor. You could even retain the one constructor that is removed, just in 
case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes exam...

2017-07-27 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/18749#discussion_r129929153
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/ExpressionDescription.java
 ---
@@ -29,15 +29,40 @@
  * show the usage of the function in human language.
  *
  * `usage()` will be used for the function usage in brief way.
- * `extended()` will be used for the function usage in verbose way, suppose
- *  an example will be provided.
  *
- *  And we can refer the function name by `_FUNC_`, in `usage` and 
`extended`, as it's
+ * These below are concatenated and used for the function usage in verbose 
way, suppose arguments,
+ * examples, note and since will be provided.
+ *
+ * `arguments()` describes arguments for the expression. This should 
follow the format as below:
+ *
+ *   Arguments:
+ * * arg0 - ...
+ * 
+ * * arg1 - ...
+ * 
+ *
+ * `examples()` describes examples for the expression. This should follow 
the format as below:
+ *
+ *   Examples:
+ * > SELECT ...;
+ *  ...
+ * > SELECT ...;
+ *  ...
+ *
+ * `note()` contains some notes for the expression optionally.
+ *
+ * `since()` contains version information for the expression. Version is 
specified by,
+ * for example, "2.2.0".
+ *
+ *  We can refer the function name by `_FUNC_`, in `usage`, `arguments` 
and `examples`, as it's
  *  registered in `FunctionRegistry`.
  */
 @DeveloperApi
--- End diff --

Agree, that's my only question, whether this change matters, because it's a 
developer API. You provide default implementations, though `extended()` gets 
removed. Hm. I am wondering if it's possible to keep `extended()` but, well, 
ignore it? it would at least be compatible even if it meant someone's 
implementation out there would have to update to provide information to 
`ExpressionInfo` correctly. That's not really a functional problem though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18659: [SPARK-21404][PYSPARK][WIP] Simple Python Vectori...

2017-07-27 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18659#discussion_r129928163
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -132,6 +135,61 @@ private[sql] object ArrowConverters {
 }
   }
 
+  private[sql] def fromPayloadIterator(iter: Iterator[ArrowPayload]): 
Iterator[InternalRow] = {
+new Iterator[InternalRow] {
+  private val _allocator = new RootAllocator(Long.MaxValue)
+  private var _reader: ArrowFileReader = _
+  private var _root: VectorSchemaRoot = _
+  private var _index = 0
+
+  loadNextBatch()
+
+  override def hasNext: Boolean = _root != null && _index < 
_root.getRowCount
+
+  override def next(): InternalRow = {
+val fields = _root.getFieldVectors.asScala
+
+val genericRowData = fields.map { field =>
+  field.getAccessor.getObject(_index)
+}.toArray[Any]
--- End diff --

Thanks @kiszk , I'm giving it a try!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18740: [SPARK-21538][SQL] Attribute resolution inconsistency in...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18740
  
**[Test build #80009 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80009/testReport)**
 for PR 18740 at commit 
[`0b0eea9`](https://github.com/apache/spark/commit/0b0eea941cb850967e943719822cfab89479a025).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18664
  
**[Test build #80008 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80008/testReport)**
 for PR 18664 at commit 
[`3b83d7a`](https://github.com/apache/spark/commit/3b83d7acf17433b1f5581f0b8b87c54a91309839).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18749: [SPARK-21485][FOLLOWUP][SQL][DOCS] Describes examples an...

2017-07-27 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18749
  
cc @rxin, @srowen and @cloud-fan, I believe this one is ready for a review. 
Could you take a look when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18540
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18540
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80006/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18540: [SPARK-19451][SQL] rangeBetween method should accept Lon...

2017-07-27 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18540
  
**[Test build #80006 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80006/testReport)**
 for PR 18540 at commit 
[`a1f91cd`](https://github.com/apache/spark/commit/a1f91cd7b0f10176b551cb168bf23d2eef68c15c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18740: [SPARK-21538][SQL] Attribute resolution inconsist...

2017-07-27 Thread aokolnychyi

Github user aokolnychyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/18740#discussion_r129911780
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -1304,6 +1304,15 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
   assert(rlike3.count() == 0)
 }
   }
+
+  test("SPARK-21538: Attribute resolution inconsistency in Dataset API") {
+val df = spark.range(1).withColumnRenamed("id", "x")
+checkAnswer(df.sort(col("id")), df.sort("id"))
+checkAnswer(df.sort($"id"), df.sort("id"))
+checkAnswer(df.sort('id), df.sort("id"))
+checkAnswer(df.orderBy('id), df.sort("id"))
+checkAnswer(df.orderBy("id"), df.sort("id"))
--- End diff --

Indeed, looks much better. I appreciate the explanation and will take this 
into account in the future. I will update the test in a minute, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 242 matches

Mail list logo