date:20160703

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-03 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69410786
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -36,10 +37,21 @@ class DecisionTreeRegressorSuite
 
   private var categoricalDataPointsRDD: RDD[LabeledPoint] = _
 
+  private var toyData: RDD[LabeledPoint] = _
+
   override def beforeAll() {
 super.beforeAll()
+
 categoricalDataPointsRDD =
   
sc.parallelize(OldDecisionTreeSuite.generateCategoricalDataPoints().map(_.asML))
+toyData = sc.parallelize(Seq(
--- End diff --

Move ```toyData``` to ```TreeTests```. You can refer [Feature importance 
with toy 
data](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala#L108).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...

2016-07-03 Thread yanboliang

Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/13981#discussion_r69410553
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala
 ---
@@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite
   assert(variance === expectedVariance,
 s"Expected variance $expectedVariance but got $variance.")
 }
+
+val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0)
+dt.setMaxDepth(1)
+  .setMaxBins(6)
--- End diff --

I'd like to remove the explicit setting since the default value(32) meets 
your needs. We want to make Jenkins logging clean and reduce the number of 
warnings if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable Fr...

2016-07-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14037


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14037
  
merging to master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...

2016-07-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/12203
  
Maybe as you mentioned - skip adding to distributed cache and log warning - 
is enough, throwing exception will fail the application and this is actually 
not a fatal problem. I'm OK to change the current behavior for this, what do 
you think @vanzin ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14040
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14040
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61704/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14040
  
**[Test build #61704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61704/consoleFull)**
 for PR 14040 at commit 
[`298ced4`](https://github.com/apache/spark/commit/298ced4d3e8603ec3d044dc5af0e16d91850c9ee).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...

2016-07-03 Thread RicoGit

Github user RicoGit commented on the issue:

https://github.com/apache/spark/pull/12203
  
Thanks, i understand this is different problems. What will you advice me? I 
think that this is not good solution: `require(localizedPath != null)` just 
fails with exception message "requirements fails".It is better skip adding to 
the distributed cache and log warning. How do you think it is enough to open 
issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14042: [SPARK-16329] [SQL] [Backport-1.6] Star Expansion over T...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14042
  
**[Test build #61711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61711/consoleFull)**
 for PR 14042 at commit 
[`edeeb14`](https://github.com/apache/spark/commit/edeeb1421931963affbd5402301563579b00611a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14042: [SPARK-16329] [SQL] [Backport-1.6] Star Expansion...

2016-07-03 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14042

[SPARK-16329] [SQL] [Backport-1.6] Star Expansion over Table Containing No 
Column #14040

 What changes were proposed in this pull request?
Star expansion over a table containing zero column does not work since 1.6. 
However, it works in Spark 1.5.1. This PR is to fix the issue in the master 
branch.

For example, 
```scala
val rddNoCols = sqlContext.sparkContext.parallelize(1 to 10).map(_ => 
Row.empty)
val dfNoCols = sqlContext.createDataFrame(rddNoCols, StructType(Seq.empty))
dfNoCols.registerTempTable("temp_table_no_cols")
sqlContext.sql("select * from temp_table_no_cols").show
```

Without the fix, users will get the following the exception:
```
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:221)
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199)
```

 How was this patch tested?
Tests are added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark starExpansionEmpty

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14042.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14042


commit edeeb1421931963affbd5402301563579b00611a
Author: gatorsmile 
Date:   2016-07-04T05:09:24Z

backport to 1.6




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...

2016-07-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/12203
  
Can you make sure the problem you met is exactly the same as what this PR 
solved? Since the exception stack you pasted in the StackOverFlow is different 
from What I pasted here before. From you exception stack, what I could guess is 
that same jar (same path with same file name) added twice, this is a little 
different from this PR's mentioned problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14037
  
Based on my understanding, previously, we had an Analyzer rule 
`PreInsertionCasts`, which generates `InsertIntoHiveTable`

https://github.com/apache/spark/pull/13754/files#diff-ee66e11b56c21364760a5ed2b783f863L483

In one of your PRs (https://github.com/apache/spark/pull/13754), that rule 
is removed. After that, `InsertIntoHiveTable` becomes useless. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function

2016-07-03 Thread janplus

Github user janplus commented on the issue:

https://github.com/apache/spark/pull/14008
  
@cloud-fan Thank you for review.
I did some code style fixes as you suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14004
  
**[Test build #61710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61710/consoleFull)**
 for PR 14004 at commit 
[`922e6e7`](https://github.com/apache/spark/commit/922e6e7aa93ae2b4cce31db0726722db3a534afe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61709/consoleFull)**
 for PR 14033 at commit 
[`3e3a794`](https://github.com/apache/spark/commit/3e3a794a2a7ff90b2f69d05bd0d36e6e5b3549d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14041: [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14041
  
**[Test build #61708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61708/consoleFull)**
 for PR 14041 at commit 
[`5312215`](https://github.com/apache/spark/commit/5312215027f385aefba95fd7b3652603ed432fc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14041: [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka...

2016-07-03 Thread koeninger

GitHub user koeninger opened a pull request:

https://github.com/apache/spark/pull/14041

[SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10

## What changes were proposed in this pull request?
during sbt unidoc task, skip the streamingKafka010 subproject and filter 
kafka 0.10 classes from the classpath, so that at least existing kafka 0.8 doc 
can be included in unidoc without error

## How was this patch tested?
sbt spark/scalaunidoc:doc | grep -i error

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/koeninger/spark-1 SPARK-16359

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14041.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14041


commit 5312215027f385aefba95fd7b3652603ed432fc3
Author: cody koeninger 
Date:   2016-07-04T04:45:06Z

[SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69408201
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: Any): Pattern = {
+if (key != null) {
+  val sb = new StringBuilder()
+  sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX)
+  Pattern.compile(sb.toString())
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: Any): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case NonFatal(_) => null
+}
+  }
+
+
+  private def extractValueFromQuery(query: Any, pattern: Pattern): Any = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: Any): Any = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: Any, partToExtract: Any): Any = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69408152
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: Any): Pattern = {
+if (key != null) {
+  val sb = new StringBuilder()
+  sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX)
+  Pattern.compile(sb.toString())
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: Any): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case NonFatal(_) => null
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69408137
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: Any): Pattern = {
--- End diff --

OK


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13804: [Minor][Core] Fix display wrong free memory size in the ...

2016-07-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/13804
  
OK, let me do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...

2016-07-03 Thread RicoGit

Github user RicoGit commented on the issue:

https://github.com/apache/spark/pull/12203
  
Thanks for reply. I have [problem with running spark job with 
oozie](http://stackoverflow.com/questions/38144022/oozie-spark-action-requirement-failed).
 This patch solves my problem. I applied this path to spark 1.6, built 
(spark-yarn_2.10-1.6.0-cdh5.7.0.jar) and put into sharedLibs of oozie.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12000: [SPARK-14204] [SQL] register driverClass rather than use...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/12000
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-03 Thread maropu

GitHub user maropu reopened a pull request:

https://github.com/apache/spark/pull/14038

[SPARK-16317][SQL] Add a new interface to filter files in FileFormat

## What changes were proposed in this pull request?
This pr is to add an interface for filtering files in `FileFormat` not to 
pass invalid files into `FileFormat#buildReader`.

## How was this patch tested?
Added tests to filter files in a driver and in parallel.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-16317

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14038.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14038


commit 67703098f96da37fbe23e0f2d76017698671d5e2
Author: Takeshi YAMAMURO 
Date:   2016-07-04T02:13:34Z

Add a new interface to filter files in FileFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14036
  
**[Test build #61706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61706/consoleFull)**
 for PR 14036 at commit 
[`ff97457`](https://github.com/apache/spark/commit/ff9745776fcf97ff063dec0811762a3e0c4b1840).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-03 Thread maropu

Github user maropu closed the pull request at:

https://github.com/apache/spark/pull/14038


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-03 Thread maropu

Github user maropu commented on the issue:

https://github.com/apache/spark/pull/14038
  
@liancheng Could you review this after v2.0 released?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61707/consoleFull)**
 for PR 14033 at commit 
[`6b82f6c`](https://github.com/apache/spark/commit/6b82f6cdfa28a93f7473a5ddf0ac60a06c1837a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14037
  
LGTM, do you know why we have this before?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...

2016-07-03 Thread techaddict

Github user techaddict commented on a diff in the pull request:

https://github.com/apache/spark/pull/14036#discussion_r69407742
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala
 ---
@@ -285,6 +284,75 @@ case class Divide(left: Expression, right: Expression)
 }
 
 @ExpressionDescription(
+  usage = "a _FUNC_ b - Divides a by b.",
+  extended = "> SELECT 3 _FUNC_ 2;\n 1")
+case class IntegerDivide(left: Expression, right: Expression)
--- End diff --

Done 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14038
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61701/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407701
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

Oh, I see now what you mean! Correctly, I missed that.
I'll add the logic and testcase. Thank you again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14038
  
**[Test build #61701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61701/consoleFull)**
 for PR 14038 at commit 
[`6770309`](https://github.com/apache/spark/commit/67703098f96da37fbe23e0f2d76017698671d5e2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407665
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

we should throw `AnalysisException` instead of `ClassCastException`, the 
type checking is not working here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407641
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

```
scala> sql("select stack(1.0,2,3)");
java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be 
cast to java.lang.Integer
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407647
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
+import scala.collection.mutable.ArrayBuffer
--- End diff --

Oops. My bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14039
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61705 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61705/consoleFull)**
 for PR 14033 at commit 
[`e21bdd9`](https://github.com/apache/spark/commit/e21bdd9c2901ef69b3e0e1e1d3d3f2126aea).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14039
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61702/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407596
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

Should I modified the description, `the first data type rules`, more 
clearly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14039
  
**[Test build #61702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61702/consoleFull)**
 for PR 14039 at commit 
[`4e56d5b`](https://github.com/apache/spark/commit/4e56d5bb596954349093de3702420e51194ffa42).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14004#discussion_r69407577
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala
 ---
@@ -246,4 +246,31 @@ class ComplexTypeSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 checkMetadata(CreateStructUnsafe(Seq(a, b)))
 checkMetadata(CreateNamedStructUnsafe(Seq("a", a, "b", b)))
   }
+
+  test("Sentences") {
+// Hive compatible test-cases.
+checkEvaluation(
+  Sentences("Hi there! The price was $1,234.56 But, not now."),
+  Seq(
+Seq("Hi", "there").map(UTF8String.fromString),
+Seq("The", "price", "was").map(UTF8String.fromString),
+Seq("But", "not", "now").map(UTF8String.fromString)),
+  EmptyRow)
+
+checkEvaluation(
+  Sentences("Hi there! The price was $1,234.56 But, not now.", 
"en"),
+  Seq(
+Seq("Hi", "there").map(UTF8String.fromString),
+Seq("The", "price", "was").map(UTF8String.fromString),
+Seq("But", "not", "now").map(UTF8String.fromString)),
+  EmptyRow)
+
+checkEvaluation(
+  Sentences("Hi there! The price was $1,234.56 But, not now.", 
"en", "US"),
+Seq(
--- End diff --

wrong ident here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407567
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

Oh, there is misleading comment. The first argument `1` is the number of 
row. Its type is checked by type-checker.
The type of first argument of data, `1.0`, rules the followings.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407515
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.expressions
 
+import scala.collection.mutable.ArrayBuffer
--- End diff --

unnecessary import?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407491
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

e.g. what if the first argument is not int type? and I'm also surprised 
that `stack(1, 1.0, 2)` works, we will cast `1.0` to int type, according to the 
definition of `inputTypes`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407409
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 
arguments.")
+} else if (!children.head.foldable || 
children.head.eval().asInstanceOf[Int] < 1) {
+  TypeCheckResult.TypeCheckFailure("The number of rows must be 
positive constant.")
+} else if (children.tail.map(_.dataType).distinct.count(_ != NullType) 
> 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The expressions should all have the same type," +
+  s" but got $prettyName(${children.map(_.dataType)}).")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  private lazy val numRows = children.head.eval().asInstanceOf[Int]
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
+
+  override def elementSchema: StructType = {
+var schema = new StructType()
+for (i <- 0 until numFields) {
+  schema = schema.add(s"col$i", children(1).dataType)
+}
+schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
+val values = children.tail.map(_.eval(input))
+for (row <- 0 until numRows) yield {
+  val fields = ArrayBuffer.empty[Any]
--- End diff --

Right, Good catch! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14006
  
**[Test build #61703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61703/consoleFull)**
 for PR 14006 at commit 
[`47c7b16`](https://github.com/apache/spark/commit/47c7b165086324a473dc659fbb216ef6601194bf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14006
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14006
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61703/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13532
  
merging to master, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread dongjoon-hyun

Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69407065
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

Thank you for review again, @cloud-fan .
For this, I added type casting tests here.

https://github.com/apache/spark/pull/14033/files#diff-a2587541e08bf6e23df33738488d070aR30

Did I miss something there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13532: [SPARK-15204][SQL] improve nullability inference ...

2016-07-03 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13532


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14037
  
cc @rxin @cloud-fan @liancheng @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69406936
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 
arguments.")
+} else if (!children.head.foldable || 
children.head.eval().asInstanceOf[Int] < 1) {
+  TypeCheckResult.TypeCheckFailure("The number of rows must be 
positive constant.")
+} else if (children.tail.map(_.dataType).distinct.count(_ != NullType) 
> 1) {
+  TypeCheckResult.TypeCheckFailure(
+s"The expressions should all have the same type," +
+  s" but got $prettyName(${children.map(_.dataType)}).")
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  private lazy val numRows = children.head.eval().asInstanceOf[Int]
+  private lazy val numFields = ((children.length - 1) + numRows - 1) / 
numRows
+
+  override def elementSchema: StructType = {
+var schema = new StructType()
+for (i <- 0 until numFields) {
+  schema = schema.add(s"col$i", children(1).dataType)
+}
+schema
+  }
+
+  override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
+val values = children.tail.map(_.eval(input))
+for (row <- 0 until numRows) yield {
+  val fields = ArrayBuffer.empty[Any]
--- End diff --

why use `ArrayBuffer` here? The number of columns is already known right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14037
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14037
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61700/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14033#discussion_r69406896
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala
 ---
@@ -94,6 +96,61 @@ case class UserDefinedGenerator(
 }
 
 /**
+ * Separate v1, ..., vk into n rows. Each row will have k/n columns. n 
must be constant.
+ * {{{
+ *   SELECT stack(2, 1, 2, 3)) ->
+ *   1  2
+ *   3  NULL
+ * }}}
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.",
+  extended = "> SELECT _FUNC_(2, 1, 2, 3);\n  [1,2]\n  [3,null]")
+case class Stack(children: Seq[Expression])
+extends Expression with Generator with ImplicitCastInputTypes with 
CodegenFallback {
+
+  override def inputTypes: Seq[DataType] =
+Seq(IntegerType) ++ Seq.fill(children.length - 
1)(children.tail.head.dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
--- End diff --

As we override `checkInputDataTypes` here, the `ImplicitCastInputTypes` is 
useless now.
We need to take care of all type check logic in `checkInputDataTypes` 
ourselves.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14037
  
**[Test build #61700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61700/consoleFull)**
 for PR 14037 at commit 
[`5530269`](https://github.com/apache/spark/commit/5530269e7081c12c049707b2205ec5d401cb5ae7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13804: [Minor][Core] Fix display wrong free memory size in the ...

2016-07-03 Thread lw-lin

Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/13804
  
hi @jerryshao, let's also back-port this into 1.6.x 
([MemoryStore.scala#L395](https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L395))
 maybe?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...

2016-07-03 Thread liancheng

Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/14025
  
@WeichenXu123 Is this ready for review? If yes, please remove the WIP tag 
in the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14040
  
**[Test build #61704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61704/consoleFull)**
 for PR 14040 at commit 
[`298ced4`](https://github.com/apache/spark/commit/298ced4d3e8603ec3d044dc5af0e16d91850c9ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69406199
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: Any): Pattern = {
--- End diff --

we should explicitly say the argument is `UTF8String`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69406176
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: Any): Pattern = {
+if (key != null) {
+  val sb = new StringBuilder()
+  sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX)
+  Pattern.compile(sb.toString())
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: Any): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case NonFatal(_) => null
+}
+  }
+
+
+  private def extractValueFromQuery(query: Any, pattern: Pattern): Any = {
+val m = pattern.matcher(query.toString)
+if (m.find()) {
+  UTF8String.fromString(m.group(2))
+} else {
+  null
+}
+  }
+
+  private def extractFromUrl(url: URL, partToExtract: Any): Any = {
+if (partToExtract.equals(HOST)) {
+  UTF8String.fromString(url.getHost)
+} else if (partToExtract.equals(PATH)) {
+  UTF8String.fromString(url.getPath)
+} else if (partToExtract.equals(QUERY)) {
+  UTF8String.fromString(url.getQuery)
+} else if (partToExtract.equals(REF)) {
+  UTF8String.fromString(url.getRef)
+} else if (partToExtract.equals(PROTOCOL)) {
+  UTF8String.fromString(url.getProtocol)
+} else if (partToExtract.equals(FILE)) {
+  UTF8String.fromString(url.getFile)
+} else if (partToExtract.equals(AUTHORITY)) {
+  UTF8String.fromString(url.getAuthority)
+} else if (partToExtract.equals(USERINFO)) {
+  UTF8String.fromString(url.getUserInfo)
+} else {
+  null
+}
+  }
+
+  private def parseUrlWithoutKey(url: Any, partToExtract: Any): Any = {
+if (url != null && partToExtract != null) {
+  if (cachedUrl ne null) {
+extractFromUrl(cachedUrl, partToExtract)
+  } else {

[GitHub] spark pull request #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion...

2016-07-03 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14040

[SPARK-16329] [SQL] [Backport-2.0] Star Expansion over Table Containing No 
Column

 What changes were proposed in this pull request?
Star expansion over a table containing zero column does not work since 1.6. 
However, it works in Spark 1.5.1. This PR is to fix the issue in the master 
branch.

For example, 
```scala
val rddNoCols = sqlContext.sparkContext.parallelize(1 to 10).map(_ => 
Row.empty)
val dfNoCols = sqlContext.createDataFrame(rddNoCols, StructType(Seq.empty))
dfNoCols.registerTempTable("temp_table_no_cols")
sqlContext.sql("select * from temp_table_no_cols").show
```

Without the fix, users will get the following the exception:
```
java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:221)
at 
org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199)
```

 How was this patch tested?
Tests are added

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark starExpansionEmptyTable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14040


commit 298ced4d3e8603ec3d044dc5af0e16d91850c9ee
Author: gatorsmile 
Date:   2016-07-04T03:38:27Z

backport to 2.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69406035
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala
 ---
@@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: 
Expression, pad: Expression)
   override def prettyName: String = "rpad"
 }
 
+object ParseUrl {
+  private val HOST = UTF8String.fromString("HOST")
+  private val PATH = UTF8String.fromString("PATH")
+  private val QUERY = UTF8String.fromString("QUERY")
+  private val REF = UTF8String.fromString("REF")
+  private val PROTOCOL = UTF8String.fromString("PROTOCOL")
+  private val FILE = UTF8String.fromString("FILE")
+  private val AUTHORITY = UTF8String.fromString("AUTHORITY")
+  private val USERINFO = UTF8String.fromString("USERINFO")
+  private val REGEXPREFIX = "(&|^)"
+  private val REGEXSUBFIX = "=([^&]*)"
+}
+
+/**
+ * Extracts a part from a URL
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL",
+  extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, 
USERINFO.
+Key specifies which query to extract.
+Examples:
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST')
+  'spark.apache.org'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY')
+  'query=1'
+  > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 
'query')
+  '1'""")
+case class ParseUrl(children: Seq[Expression])
+  extends Expression with ImplicitCastInputTypes with CodegenFallback {
+
+  override def nullable: Boolean = true
+  override def inputTypes: Seq[DataType] = 
Seq.fill(children.size)(StringType)
+  override def dataType: DataType = StringType
+  override def prettyName: String = "parse_url"
+
+  // If the url is a constant, cache the URL object so that we don't need 
to convert url
+  // from UTF8String to String to URL for every row.
+  @transient private lazy val cachedUrl = stringExprs(0) match {
+case Literal(url: UTF8String, _) => getUrl(url)
+case _ => null
+  }
+
+  // If the key is a constant, cache the Pattern object so that we don't 
need to convert key
+  // from UTF8String to String to StringBuilder to String to Pattern for 
every row.
+  @transient private lazy val cachedPattern = stringExprs(2) match {
+case Literal(key: UTF8String, _) => getPattern(key)
+case _ => null
+  }
+
+  private lazy val stringExprs = children.toArray
+  import ParseUrl._
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.size > 3 || children.size < 2) {
+  TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two 
or three arguments")
+} else {
+  super[ImplicitCastInputTypes].checkInputDataTypes()
+}
+  }
+
+  private def getPattern(key: Any): Pattern = {
+if (key != null) {
+  val sb = new StringBuilder()
+  sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX)
+  Pattern.compile(sb.toString())
+} else {
+  null
+}
+  }
+
+  private def getUrl(url: Any): URL = {
+try {
+  new URL(url.toString)
+} catch {
+  case NonFatal(_) => null
--- End diff --

Seems `new URL` will only throw `MalformedURLException`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14006
  
**[Test build #61703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61703/consoleFull)**
 for PR 14006 at commit 
[`47c7b16`](https://github.com/apache/spark/commit/47c7b165086324a473dc659fbb216ef6601194bf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14039
  
**[Test build #61702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61702/consoleFull)**
 for PR 14039 at commit 
[`4e56d5b`](https://github.com/apache/spark/commit/4e56d5bb596954349093de3702420e51194ffa42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14039: [SPARK-15896][SQL] Clean up shuffle files just af...

2016-07-03 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/14039

[SPARK-15896][SQL] Clean up shuffle files just after jobs finished

## What changes were proposed in this pull request?
Since `ShuffleRDD` in a SQL query could not be reuse later, this pr is to 
remove the shuffle files after finish a query to free the disk space as soon as 
possible.

## How was this patch tested?
Manually checked all files were deleted just after jobs finished.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-15896

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14039.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14039


commit 4e56d5bb596954349093de3702420e51194ffa42
Author: Takeshi YAMAMURO 
Date:   2016-06-28T22:35:17Z

Clean up shuffle files just after jobs finished




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...

2016-07-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/14025
  
cc @liancheng 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14038
  
**[Test build #61701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61701/consoleFull)**
 for PR 14038 at commit 
[`6770309`](https://github.com/apache/spark/commit/67703098f96da37fbe23e0f2d76017698671d5e2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...

2016-07-03 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/14038

[SPARK-16317][SQL] Add a new interface to filter files in FileFormat

## What changes were proposed in this pull request?
This pr is to add an interface for filtering files in `FileFormat` not to 
pass invalid files into `FileFormat#buildReader`.

## How was this patch tested?
Added tests to filter files in a driver and in parallel.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SPARK-16317

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14038.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14038


commit 67703098f96da37fbe23e0f2d76017698671d5e2
Author: Takeshi YAMAMURO 
Date:   2016-07-04T02:13:34Z

Add a new interface to filter files in FileFormat




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14037
  
**[Test build #61700 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61700/consoleFull)**
 for PR 14037 at commit 
[`5530269`](https://github.com/apache/spark/commit/5530269e7081c12c049707b2205ec5d401cb5ae7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14037: [SPARK-16358] [SQL] Remove LogicalPlan Node Inser...

2016-07-03 Thread gatorsmile

GitHub user gatorsmile opened a pull request:

https://github.com/apache/spark/pull/14037

[SPARK-16358] [SQL] Remove LogicalPlan Node InsertIntoHiveTable 

 What changes were proposed in this pull request?
LogicalPlan `InsertIntoHiveTable` is useless. Thus, we can remove it from 
the code base.

 How was this patch tested?
The existing test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gatorsmile/spark InsertIntoHiveTable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14037.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14037


commit 5530269e7081c12c049707b2205ec5d401cb5ae7
Author: gatorsmile 
Date:   2016-07-04T02:43:48Z

remove InsertIntoHiveTable LogicalPlan nodes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61699/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13517
  
**[Test build #61699 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61699/consoleFull)**
 for PR 13517 at commit 
[`1307f8c`](https://github.com/apache/spark/commit/1307f8cbdd4b26885a81ad6e5770c2bb82a0159e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13517
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61698/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13517
  
**[Test build #61698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61698/consoleFull)**
 for PR 13517 at commit 
[`30dfea0`](https://github.com/apache/spark/commit/30dfea05bb0ce864a7ccb5fe6a2d091c7fe3c988).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...

2016-07-03 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/12203
  
@RicoGit  This is a behavior change for jars uploading to distributed 
cache, I'm not sure if it is suitable to back-port to branch 1.6. Also this 
problem is not so severe in 1.6 since we do the assembly for packaging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function

2016-07-03 Thread janplus

Github user janplus commented on the issue:

https://github.com/apache/spark/pull/14008
  
@dongjoon-hyun @cloud-fan It is nice to have you review my PR. Thank you!
I have add a new commit with following things:

1. Revert driver side's literal key invalidation.
2. Resolve conflicts with master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...

2016-07-03 Thread janplus

Github user janplus commented on a diff in the pull request:

https://github.com/apache/spark/pull/14008#discussion_r69401574
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala
 ---
@@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite 
with ExpressionEvalHelper {
 checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 
0)
 checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 
0)
   }
+
+  test("ParseUrl") {
+def checkParseUrl(expected: String, urlStr: String, partToExtract: 
String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType),
+  Literal.create(partToExtract, StringType))), expected)
+}
+def checkParseUrlWithKey(
+expected: String, urlStr: String,
+partToExtract: String, key: String): Unit = {
+  checkEvaluation(
+ParseUrl(Seq(Literal.create(urlStr, StringType), 
Literal.create(partToExtract, StringType),
+  Literal.create(key, StringType))), expected)
+}
+
+checkParseUrl("spark.apache.org", 
"http://spark.apache.org/path?query=1;, "HOST")
+checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH")
+checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, 
"QUERY")
+checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF")
+checkParseUrl("http", "http://spark.apache.org/path?query=1;, 
"PROTOCOL")
+checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, 
"FILE")
+checkParseUrl("spark.apache.org:8080", 
"http://spark.apache.org:8080/path?query=1;, "AUTHORITY")
+checkParseUrl("userinfo", 
"http://useri...@spark.apache.org/path?query=1;, "USERINFO")
+checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, 
"QUERY", "query")
+
+// Null checking
+checkParseUrl(null, null, "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, null)
+checkParseUrl(null, null, null)
+checkParseUrl(null, "test", "HOST")
+checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"HOST", "query")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "quer")
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", null)
+checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, 
"QUERY", "")
+
+// exceptional cases
+intercept[java.util.regex.PatternSyntaxException] {
--- End diff --

OK, @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69400558
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala
 ---
@@ -158,7 +159,7 @@ class RandomForestClassifierSuite
   }
 
   test("Fitting without numClasses in metadata") {
-val df: DataFrame = 
spark.createDataFrame(TreeTests.featureImportanceData(sc))
+val df: DataFrame = TreeTests.featureImportanceData(sc).toDF()
--- End diff --

I also agree with this but actually it seems both are fine assuming from 
this discussion, https://github.com/apache/spark/pull/12452


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69400523
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala
 ---
@@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite
 // the input seed is somewhat magic, to make this test pass
 val rdd = sc.parallelize(generateMultinomialLogisticInput(
   coefficients, xMean, xVariance, true, nPoints, 1), 2)
-val dataFrame = spark.createDataFrame(rdd).toDF("label", "features")
+val dataFrame = rdd.toDF("label", "features")
--- End diff --

Again, I also agree with this but I am hesitated to change this because it 
is explicitly set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...

2016-07-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/14035#discussion_r69400465
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 ---
@@ -55,7 +56,7 @@ class LogisticRegressionSuite
 generateMultinomialLogisticInput(coefficients, xMean, xVariance,
   addIntercept = true, nPoints, 42)
 
-  spark.createDataFrame(sc.parallelize(testData, 4))
+  sc.parallelize(testData, 4).toDF()
--- End diff --

I guess, to be strict, `sc.parallelize(testData, 4).toDF()` and 
`testData.toDF.repartition(4)` would not be exactly the same. It seems the 
author of this test code intended to explicitly set the initial number of 
partitions to `4` and I left as it is although I think as you said because I am 
not 100% sure and it is not the part of this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13517
  
**[Test build #61699 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61699/consoleFull)**
 for PR 13517 at commit 
[`1307f8c`](https://github.com/apache/spark/commit/1307f8cbdd4b26885a81ad6e5770c2bb82a0159e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13517
  
**[Test build #61698 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61698/consoleFull)**
 for PR 13517 at commit 
[`30dfea0`](https://github.com/apache/spark/commit/30dfea05bb0ce864a7ccb5fe6a2d091c7fe3c988).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types for `t...

2016-07-03 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/13517#discussion_r69399556
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala
 ---
@@ -1117,4 +1117,26 @@ class MetastoreDataSourcesSuite extends QueryTest 
with SQLTestUtils with TestHiv
   }
 }
   }
+
+  test("SPARK-14839: Support for other types as option in OPTIONS clause") 
{
--- End diff --

Sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61696/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14033
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61696 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61696/consoleFull)**
 for PR 14033 at commit 
[`f02e1dd`](https://github.com/apache/spark/commit/f02e1dd0928992e530ea8d8a0663050fecdcd4ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Stack(children: Seq[Expression])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13532
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/13532
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61697/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13532
  
**[Test build #61697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61697/consoleFull)**
 for PR 13532 at commit 
[`23263e4`](https://github.com/apache/spark/commit/23263e4940f5b6e67ee7b06b9e0fad72bbe7606f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/13532
  
**[Test build #61697 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61697/consoleFull)**
 for PR 13532 at commit 
[`23263e4`](https://github.com/apache/spark/commit/23263e4940f5b6e67ee7b06b9e0fad72bbe7606f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2016-07-03 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14033
  
**[Test build #61696 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61696/consoleFull)**
 for PR 14033 at commit 
[`f02e1dd`](https://github.com/apache/spark/commit/f02e1dd0928992e530ea8d8a0663050fecdcd4ce).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #13532: [SPARK-15204][SQL] improve nullability inference ...

2016-07-03 Thread koertkuipers

Github user koertkuipers commented on a diff in the pull request:

https://github.com/apache/spark/pull/13532#discussion_r69397207
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala ---
@@ -305,4 +305,13 @@ class DatasetAggregatorSuite extends QueryTest with 
SharedSQLContext {
 val ds = Seq(1, 2, 3).toDS()
 checkDataset(ds.select(MapTypeBufferAgg.toColumn), 1)
   }
+
+  test("spark-15204 improve nullability inference for Aggregator") {
+val ds1 = Seq(1, 3, 2, 5).toDS()
+assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable === 
false)
+val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS()
+assert(ds2.groupByKey(_.b).agg(SeqAgg.toColumn).schema(1).nullable === 
true)
--- End diff --

the last assert with NameAgg tests String as output of the Aggregator. is 
that good enough?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...

2016-07-03 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14036
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 312 matches

Mail list logo