[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69410786 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -36,10 +37,21 @@ class DecisionTreeRegressorSuite private var categoricalDataPointsRDD: RDD[LabeledPoint] = _ + private var toyData: RDD[LabeledPoint] = _ + override def beforeAll() { super.beforeAll() + categoricalDataPointsRDD = sc.parallelize(OldDecisionTreeSuite.generateCategoricalDataPoints().map(_.asML)) +toyData = sc.parallelize(Seq( --- End diff -- Move ```toyData``` to ```TreeTests```. You can refer [Feature importance with toy data](https://github.com/apache/spark/blob/master/mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala#L108). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13981: [SPARK-16307] [ML] Add test to verify the predict...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/13981#discussion_r69410553 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/DecisionTreeRegressorSuite.scala --- @@ -96,6 +108,15 @@ class DecisionTreeRegressorSuite assert(variance === expectedVariance, s"Expected variance $expectedVariance but got $variance.") } + +val toyDF = TreeTests.setMetadata(toyData, Map.empty[Int, Int], 0) +dt.setMaxDepth(1) + .setMaxBins(6) --- End diff -- I'd like to remove the explicit setting since the default value(32) meets your needs. We want to make Jenkins logging clean and reduce the number of warnings if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable Fr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14037 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14037 merging to master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/12203 Maybe as you mentioned - skip adding to distributed cache and log warning - is enough, throwing exception will fail the application and this is actually not a fatal problem. I'm OK to change the current behavior for this, what do you think @vanzin ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14040 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14040 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61704/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14040 **[Test build #61704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61704/consoleFull)** for PR 14040 at commit [`298ced4`](https://github.com/apache/spark/commit/298ced4d3e8603ec3d044dc5af0e16d91850c9ee). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...
Github user RicoGit commented on the issue: https://github.com/apache/spark/pull/12203 Thanks, i understand this is different problems. What will you advice me? I think that this is not good solution: `require(localizedPath != null)` just fails with exception message "requirements fails".It is better skip adding to the distributed cache and log warning. How do you think it is enough to open issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14042: [SPARK-16329] [SQL] [Backport-1.6] Star Expansion over T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14042 **[Test build #61711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61711/consoleFull)** for PR 14042 at commit [`edeeb14`](https://github.com/apache/spark/commit/edeeb1421931963affbd5402301563579b00611a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14042: [SPARK-16329] [SQL] [Backport-1.6] Star Expansion...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14042 [SPARK-16329] [SQL] [Backport-1.6] Star Expansion over Table Containing No Column #14040 What changes were proposed in this pull request? Star expansion over a table containing zero column does not work since 1.6. However, it works in Spark 1.5.1. This PR is to fix the issue in the master branch. For example, ```scala val rddNoCols = sqlContext.sparkContext.parallelize(1 to 10).map(_ => Row.empty) val dfNoCols = sqlContext.createDataFrame(rddNoCols, StructType(Seq.empty)) dfNoCols.registerTempTable("temp_table_no_cols") sqlContext.sql("select * from temp_table_no_cols").show ``` Without the fix, users will get the following the exception: ``` java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199) ``` How was this patch tested? Tests are added You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark starExpansionEmpty Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14042.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14042 commit edeeb1421931963affbd5402301563579b00611a Author: gatorsmileDate: 2016-07-04T05:09:24Z backport to 1.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/12203 Can you make sure the problem you met is exactly the same as what this PR solved? Since the exception stack you pasted in the StackOverFlow is different from What I pasted here before. From you exception stack, what I could guess is that same jar (same path with same file name) added twice, this is a little different from this PR's mentioned problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14037 Based on my understanding, previously, we had an Analyzer rule `PreInsertionCasts`, which generates `InsertIntoHiveTable` https://github.com/apache/spark/pull/13754/files#diff-ee66e11b56c21364760a5ed2b783f863L483 In one of your PRs (https://github.com/apache/spark/pull/13754), that rule is removed. After that, `InsertIntoHiveTable` becomes useless. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function
Github user janplus commented on the issue: https://github.com/apache/spark/pull/14008 @cloud-fan Thank you for review. I did some code style fixes as you suggested. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14004 **[Test build #61710 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61710/consoleFull)** for PR 14004 at commit [`922e6e7`](https://github.com/apache/spark/commit/922e6e7aa93ae2b4cce31db0726722db3a534afe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14033 **[Test build #61709 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61709/consoleFull)** for PR 14033 at commit [`3e3a794`](https://github.com/apache/spark/commit/3e3a794a2a7ff90b2f69d05bd0d36e6e5b3549d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14041: [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14041 **[Test build #61708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61708/consoleFull)** for PR 14041 at commit [`5312215`](https://github.com/apache/spark/commit/5312215027f385aefba95fd7b3652603ed432fc3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14041: [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka...
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/14041 [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10 ## What changes were proposed in this pull request? during sbt unidoc task, skip the streamingKafka010 subproject and filter kafka 0.10 classes from the classpath, so that at least existing kafka 0.8 doc can be included in unidoc without error ## How was this patch tested? sbt spark/scalaunidoc:doc | grep -i error You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-16359 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14041.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14041 commit 5312215027f385aefba95fd7b3652603ed432fc3 Author: cody koeningerDate: 2016-07-04T04:45:06Z [SPARK-16359][STREAMING][KAFKA] unidoc skip kafka 0.10 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69408201 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: Any): Pattern = { +if (key != null) { + val sb = new StringBuilder() + sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX) + Pattern.compile(sb.toString()) +} else { + null +} + } + + private def getUrl(url: Any): URL = { +try { + new URL(url.toString) +} catch { + case NonFatal(_) => null +} + } + + + private def extractValueFromQuery(query: Any, pattern: Pattern): Any = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: Any): Any = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: Any, partToExtract: Any): Any = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else {
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69408152 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: Any): Pattern = { +if (key != null) { + val sb = new StringBuilder() + sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX) + Pattern.compile(sb.toString()) +} else { + null +} + } + + private def getUrl(url: Any): URL = { +try { + new URL(url.toString) +} catch { + case NonFatal(_) => null --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69408137 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: Any): Pattern = { --- End diff -- OK --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13804: [Minor][Core] Fix display wrong free memory size in the ...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/13804 OK, let me do it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...
Github user RicoGit commented on the issue: https://github.com/apache/spark/pull/12203 Thanks for reply. I have [problem with running spark job with oozie](http://stackoverflow.com/questions/38144022/oozie-spark-action-requirement-failed). This patch solves my problem. I applied this path to spark 1.6, built (spark-yarn_2.10-1.6.0-cdh5.7.0.jar) and put into sharedLibs of oozie. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12000: [SPARK-14204] [SQL] register driverClass rather than use...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12000 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
GitHub user maropu reopened a pull request: https://github.com/apache/spark/pull/14038 [SPARK-16317][SQL] Add a new interface to filter files in FileFormat ## What changes were proposed in this pull request? This pr is to add an interface for filtering files in `FileFormat` not to pass invalid files into `FileFormat#buildReader`. ## How was this patch tested? Added tests to filter files in a driver and in parallel. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-16317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14038.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14038 commit 67703098f96da37fbe23e0f2d76017698671d5e2 Author: Takeshi YAMAMURODate: 2016-07-04T02:13:34Z Add a new interface to filter files in FileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #61706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61706/consoleFull)** for PR 14036 at commit [`ff97457`](https://github.com/apache/spark/commit/ff9745776fcf97ff063dec0811762a3e0c4b1840). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/14038 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/14038 @liancheng Could you review this after v2.0 released? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14033 **[Test build #61707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61707/consoleFull)** for PR 14033 at commit [`6b82f6c`](https://github.com/apache/spark/commit/6b82f6cdfa28a93f7473a5ddf0ac60a06c1837a7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14038 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14037 LGTM, do you know why we have this before? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
Github user techaddict commented on a diff in the pull request: https://github.com/apache/spark/pull/14036#discussion_r69407742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala --- @@ -285,6 +284,75 @@ case class Divide(left: Expression, right: Expression) } @ExpressionDescription( + usage = "a _FUNC_ b - Divides a by b.", + extended = "> SELECT 3 _FUNC_ 2;\n 1") +case class IntegerDivide(left: Expression, right: Expression) --- End diff -- Done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14038 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61701/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407701 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Oh, I see now what you mean! Correctly, I missed that. I'll add the logic and testcase. Thank you again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #61701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61701/consoleFull)** for PR 14038 at commit [`6770309`](https://github.com/apache/spark/commit/67703098f96da37fbe23e0f2d76017698671d5e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407665 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- we should throw `AnalysisException` instead of `ClassCastException`, the type checking is not working here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407641 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- ``` scala> sql("select stack(1.0,2,3)"); java.lang.ClassCastException: org.apache.spark.sql.types.Decimal cannot be cast to java.lang.Integer ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -17,6 +17,8 @@ package org.apache.spark.sql.catalyst.expressions +import scala.collection.mutable.ArrayBuffer --- End diff -- Oops. My bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14033 **[Test build #61705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61705/consoleFull)** for PR 14033 at commit [`e21bdd9`](https://github.com/apache/spark/commit/e21bdd9c2901ef69b3e0e1e1d3d3f2126aea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14039 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61702/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407596 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Should I modified the description, `the first data type rules`, more clearly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61702/consoleFull)** for PR 14039 at commit [`4e56d5b`](https://github.com/apache/spark/commit/4e56d5bb596954349093de3702420e51194ffa42). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69407577 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ComplexTypeSuite.scala --- @@ -246,4 +246,31 @@ class ComplexTypeSuite extends SparkFunSuite with ExpressionEvalHelper { checkMetadata(CreateStructUnsafe(Seq(a, b))) checkMetadata(CreateNamedStructUnsafe(Seq("a", a, "b", b))) } + + test("Sentences") { +// Hive compatible test-cases. +checkEvaluation( + Sentences("Hi there! The price was $1,234.56 But, not now."), + Seq( +Seq("Hi", "there").map(UTF8String.fromString), +Seq("The", "price", "was").map(UTF8String.fromString), +Seq("But", "not", "now").map(UTF8String.fromString)), + EmptyRow) + +checkEvaluation( + Sentences("Hi there! The price was $1,234.56 But, not now.", "en"), + Seq( +Seq("Hi", "there").map(UTF8String.fromString), +Seq("The", "price", "was").map(UTF8String.fromString), +Seq("But", "not", "now").map(UTF8String.fromString)), + EmptyRow) + +checkEvaluation( + Sentences("Hi there! The price was $1,234.56 But, not now.", "en", "US"), +Seq( --- End diff -- wrong ident here --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407567 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Oh, there is misleading comment. The first argument `1` is the number of row. Its type is checked by type-checker. The type of first argument of data, `1.0`, rules the followings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -17,6 +17,8 @@ package org.apache.spark.sql.catalyst.expressions +import scala.collection.mutable.ArrayBuffer --- End diff -- unnecessary import? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407491 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- e.g. what if the first argument is not int type? and I'm also surprised that `stack(1, 1.0, 2)` works, we will cast `1.0` to int type, according to the definition of `inputTypes` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (!children.head.foldable || children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be positive constant.") +} else if (children.tail.map(_.dataType).distinct.count(_ != NullType) > 1) { + TypeCheckResult.TypeCheckFailure( +s"The expressions should all have the same type," + + s" but got $prettyName(${children.map(_.dataType)}).") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def elementSchema: StructType = { +var schema = new StructType() +for (i <- 0 until numFields) { + schema = schema.add(s"col$i", children(1).dataType) +} +schema + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) +for (row <- 0 until numRows) yield { + val fields = ArrayBuffer.empty[Any] --- End diff -- Right, Good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14006 **[Test build #61703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61703/consoleFull)** for PR 14006 at commit [`47c7b16`](https://github.com/apache/spark/commit/47c7b165086324a473dc659fbb216ef6601194bf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14006 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14006 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61703/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13532 merging to master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69407065 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- Thank you for review again, @cloud-fan . For this, I added type casting tests here. https://github.com/apache/spark/pull/14033/files#diff-a2587541e08bf6e23df33738488d070aR30 Did I miss something there? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13532: [SPARK-15204][SQL] improve nullability inference ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13532 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14037 cc @rxin @cloud-fan @liancheng @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69406936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"$prettyName requires at least 2 arguments.") +} else if (!children.head.foldable || children.head.eval().asInstanceOf[Int] < 1) { + TypeCheckResult.TypeCheckFailure("The number of rows must be positive constant.") +} else if (children.tail.map(_.dataType).distinct.count(_ != NullType) > 1) { + TypeCheckResult.TypeCheckFailure( +s"The expressions should all have the same type," + + s" but got $prettyName(${children.map(_.dataType)}).") +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + private lazy val numRows = children.head.eval().asInstanceOf[Int] + private lazy val numFields = ((children.length - 1) + numRows - 1) / numRows + + override def elementSchema: StructType = { +var schema = new StructType() +for (i <- 0 until numFields) { + schema = schema.add(s"col$i", children(1).dataType) +} +schema + } + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val values = children.tail.map(_.eval(input)) +for (row <- 0 until numRows) yield { + val fields = ArrayBuffer.empty[Any] --- End diff -- why use `ArrayBuffer` here? The number of columns is already known right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14037 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61700/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14033: [SPARK-16286][SQL] Implement stack table generati...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14033#discussion_r69406896 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -94,6 +96,61 @@ case class UserDefinedGenerator( } /** + * Separate v1, ..., vk into n rows. Each row will have k/n columns. n must be constant. + * {{{ + * SELECT stack(2, 1, 2, 3)) -> + * 1 2 + * 3 NULL + * }}} + */ +@ExpressionDescription( + usage = "_FUNC_(n, v1, ..., vk) - Separate v1, ..., vk into n rows.", + extended = "> SELECT _FUNC_(2, 1, 2, 3);\n [1,2]\n [3,null]") +case class Stack(children: Seq[Expression]) +extends Expression with Generator with ImplicitCastInputTypes with CodegenFallback { + + override def inputTypes: Seq[DataType] = +Seq(IntegerType) ++ Seq.fill(children.length - 1)(children.tail.head.dataType) + + override def checkInputDataTypes(): TypeCheckResult = { --- End diff -- As we override `checkInputDataTypes` here, the `ImplicitCastInputTypes` is useless now. We need to take care of all type check logic in `checkInputDataTypes` ourselves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14037 **[Test build #61700 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61700/consoleFull)** for PR 14037 at commit [`5530269`](https://github.com/apache/spark/commit/5530269e7081c12c049707b2205ec5d401cb5ae7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13804: [Minor][Core] Fix display wrong free memory size in the ...
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/13804 hi @jerryshao, let's also back-port this into 1.6.x ([MemoryStore.scala#L395](https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L395)) maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/14025 @WeichenXu123 Is this ready for review? If yes, please remove the WIP tag in the PR description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over T...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14040 **[Test build #61704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61704/consoleFull)** for PR 14040 at commit [`298ced4`](https://github.com/apache/spark/commit/298ced4d3e8603ec3d044dc5af0e16d91850c9ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69406199 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: Any): Pattern = { --- End diff -- we should explicitly say the argument is `UTF8String` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69406176 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: Any): Pattern = { +if (key != null) { + val sb = new StringBuilder() + sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX) + Pattern.compile(sb.toString()) +} else { + null +} + } + + private def getUrl(url: Any): URL = { +try { + new URL(url.toString) +} catch { + case NonFatal(_) => null +} + } + + + private def extractValueFromQuery(query: Any, pattern: Pattern): Any = { +val m = pattern.matcher(query.toString) +if (m.find()) { + UTF8String.fromString(m.group(2)) +} else { + null +} + } + + private def extractFromUrl(url: URL, partToExtract: Any): Any = { +if (partToExtract.equals(HOST)) { + UTF8String.fromString(url.getHost) +} else if (partToExtract.equals(PATH)) { + UTF8String.fromString(url.getPath) +} else if (partToExtract.equals(QUERY)) { + UTF8String.fromString(url.getQuery) +} else if (partToExtract.equals(REF)) { + UTF8String.fromString(url.getRef) +} else if (partToExtract.equals(PROTOCOL)) { + UTF8String.fromString(url.getProtocol) +} else if (partToExtract.equals(FILE)) { + UTF8String.fromString(url.getFile) +} else if (partToExtract.equals(AUTHORITY)) { + UTF8String.fromString(url.getAuthority) +} else if (partToExtract.equals(USERINFO)) { + UTF8String.fromString(url.getUserInfo) +} else { + null +} + } + + private def parseUrlWithoutKey(url: Any, partToExtract: Any): Any = { +if (url != null && partToExtract != null) { + if (cachedUrl ne null) { +extractFromUrl(cachedUrl, partToExtract) + } else {
[GitHub] spark pull request #14040: [SPARK-16329] [SQL] [Backport-2.0] Star Expansion...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14040 [SPARK-16329] [SQL] [Backport-2.0] Star Expansion over Table Containing No Column What changes were proposed in this pull request? Star expansion over a table containing zero column does not work since 1.6. However, it works in Spark 1.5.1. This PR is to fix the issue in the master branch. For example, ```scala val rddNoCols = sqlContext.sparkContext.parallelize(1 to 10).map(_ => Row.empty) val dfNoCols = sqlContext.createDataFrame(rddNoCols, StructType(Seq.empty)) dfNoCols.registerTempTable("temp_table_no_cols") sqlContext.sql("select * from temp_table_no_cols").show ``` Without the fix, users will get the following the exception: ``` java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:221) at org.apache.spark.sql.catalyst.analysis.UnresolvedStar.expand(unresolved.scala:199) ``` How was this patch tested? Tests are added You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark starExpansionEmptyTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14040.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14040 commit 298ced4d3e8603ec3d044dc5af0e16d91850c9ee Author: gatorsmileDate: 2016-07-04T03:38:27Z backport to 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69406035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") +} else { + super[ImplicitCastInputTypes].checkInputDataTypes() +} + } + + private def getPattern(key: Any): Pattern = { +if (key != null) { + val sb = new StringBuilder() + sb.append(REGEXPREFIX).append(key.toString).append(REGEXSUBFIX) + Pattern.compile(sb.toString()) +} else { + null +} + } + + private def getUrl(url: Any): URL = { +try { + new URL(url.toString) +} catch { + case NonFatal(_) => null --- End diff -- Seems `new URL` will only throw `MalformedURLException`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14006: [SPARK-13015][MLlib][DOC] Replace example code in mllib-...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14006 **[Test build #61703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61703/consoleFull)** for PR 14006 at commit [`47c7b16`](https://github.com/apache/spark/commit/47c7b165086324a473dc659fbb216ef6601194bf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14039: [SPARK-15896][SQL] Clean up shuffle files just after job...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14039 **[Test build #61702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61702/consoleFull)** for PR 14039 at commit [`4e56d5b`](https://github.com/apache/spark/commit/4e56d5bb596954349093de3702420e51194ffa42). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14039: [SPARK-15896][SQL] Clean up shuffle files just af...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/14039 [SPARK-15896][SQL] Clean up shuffle files just after jobs finished ## What changes were proposed in this pull request? Since `ShuffleRDD` in a SQL query could not be reuse later, this pr is to remove the shuffle files after finish a query to free the disk space as soon as possible. ## How was this patch tested? Manually checked all files were deleted just after jobs finished. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-15896 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14039.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14039 commit 4e56d5bb596954349093de3702420e51194ffa42 Author: Takeshi YAMAMURODate: 2016-06-28T22:35:17Z Clean up shuffle files just after jobs finished --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14025: [WIP][DOC] update out-of-date code snippets using SQLCon...
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14025 cc @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14038: [SPARK-16317][SQL] Add a new interface to filter files i...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14038 **[Test build #61701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61701/consoleFull)** for PR 14038 at commit [`6770309`](https://github.com/apache/spark/commit/67703098f96da37fbe23e0f2d76017698671d5e2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14038: [SPARK-16317][SQL] Add a new interface to filter ...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/14038 [SPARK-16317][SQL] Add a new interface to filter files in FileFormat ## What changes were proposed in this pull request? This pr is to add an interface for filtering files in `FileFormat` not to pass invalid files into `FileFormat#buildReader`. ## How was this patch tested? Added tests to filter files in a driver and in parallel. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-16317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14038.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14038 commit 67703098f96da37fbe23e0f2d76017698671d5e2 Author: Takeshi YAMAMURODate: 2016-07-04T02:13:34Z Add a new interface to filter files in FileFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14037: [SPARK-16358] [SQL] Remove InsertIntoHiveTable From Logi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14037 **[Test build #61700 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61700/consoleFull)** for PR 14037 at commit [`5530269`](https://github.com/apache/spark/commit/5530269e7081c12c049707b2205ec5d401cb5ae7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14037: [SPARK-16358] [SQL] Remove LogicalPlan Node Inser...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/14037 [SPARK-16358] [SQL] Remove LogicalPlan Node InsertIntoHiveTable What changes were proposed in this pull request? LogicalPlan `InsertIntoHiveTable` is useless. Thus, we can remove it from the code base. How was this patch tested? The existing test cases You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark InsertIntoHiveTable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14037.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14037 commit 5530269e7081c12c049707b2205ec5d401cb5ae7 Author: gatorsmileDate: 2016-07-04T02:43:48Z remove InsertIntoHiveTable LogicalPlan nodes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13517 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61699/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13517 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13517 **[Test build #61699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61699/consoleFull)** for PR 13517 at commit [`1307f8c`](https://github.com/apache/spark/commit/1307f8cbdd4b26885a81ad6e5770c2bb82a0159e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13517 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13517 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61698/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13517 **[Test build #61698 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61698/consoleFull)** for PR 13517 at commit [`30dfea0`](https://github.com/apache/spark/commit/30dfea05bb0ce864a7ccb5fe6a2d091c7fe3c988). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12203: [SPARK-14423][YARN] Avoid same name files added to distr...
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/12203 @RicoGit This is a behavior change for jars uploading to distributed cache, I'm not sure if it is suitable to back-port to branch 1.6. Also this problem is not so severe in 1.6 since we do the assembly for packaging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function
Github user janplus commented on the issue: https://github.com/apache/spark/pull/14008 @dongjoon-hyun @cloud-fan It is nice to have you review my PR. Thank you! I have add a new commit with following things: 1. Revert driver side's literal key invalidation. 2. Resolve conflicts with master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69401574 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal.create(urlStr, StringType), + Literal.create(partToExtract, StringType))), expected) +} +def checkParseUrlWithKey( +expected: String, urlStr: String, +partToExtract: String, key: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal.create(urlStr, StringType), Literal.create(partToExtract, StringType), + Literal.create(key, StringType))), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("userinfo", "http://useri...@spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, "QUERY", "query") + +// Null checking +checkParseUrl(null, null, "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, null) +checkParseUrl(null, null, null) +checkParseUrl(null, "test", "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "HOST", "query") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "quer") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", null) +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "") + +// exceptional cases +intercept[java.util.regex.PatternSyntaxException] { --- End diff -- OK, @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69400558 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -158,7 +159,7 @@ class RandomForestClassifierSuite } test("Fitting without numClasses in metadata") { -val df: DataFrame = spark.createDataFrame(TreeTests.featureImportanceData(sc)) +val df: DataFrame = TreeTests.featureImportanceData(sc).toDF() --- End diff -- I also agree with this but actually it seems both are fine assuming from this discussion, https://github.com/apache/spark/pull/12452 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69400523 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite // the input seed is somewhat magic, to make this test pass val rdd = sc.parallelize(generateMultinomialLogisticInput( coefficients, xMean, xVariance, true, nPoints, 1), 2) -val dataFrame = spark.createDataFrame(rdd).toDF("label", "features") +val dataFrame = rdd.toDF("label", "features") --- End diff -- Again, I also agree with this but I am hesitated to change this because it is explicitly set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69400465 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -55,7 +56,7 @@ class LogisticRegressionSuite generateMultinomialLogisticInput(coefficients, xMean, xVariance, addIntercept = true, nPoints, 42) - spark.createDataFrame(sc.parallelize(testData, 4)) + sc.parallelize(testData, 4).toDF() --- End diff -- I guess, to be strict, `sc.parallelize(testData, 4).toDF()` and `testData.toDF.repartition(4)` would not be exactly the same. It seems the author of this test code intended to explicitly set the initial number of partitions to `4` and I left as it is although I think as you said because I am not 100% sure and it is not the part of this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13517 **[Test build #61699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61699/consoleFull)** for PR 13517 at commit [`1307f8c`](https://github.com/apache/spark/commit/1307f8cbdd4b26885a81ad6e5770c2bb82a0159e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13517: [SPARK-14839][SQL] Support for other types for `tablePro...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13517 **[Test build #61698 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61698/consoleFull)** for PR 13517 at commit [`30dfea0`](https://github.com/apache/spark/commit/30dfea05bb0ce864a7ccb5fe6a2d091c7fe3c988). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13517: [SPARK-14839][SQL] Support for other types for `t...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13517#discussion_r69399556 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -1117,4 +1117,26 @@ class MetastoreDataSourcesSuite extends QueryTest with SQLTestUtils with TestHiv } } } + + test("SPARK-14839: Support for other types as option in OPTIONS clause") { --- End diff -- Sure! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14033 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61696/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14033 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14033 **[Test build #61696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61696/consoleFull)** for PR 14033 at commit [`f02e1dd`](https://github.com/apache/spark/commit/f02e1dd0928992e530ea8d8a0663050fecdcd4ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stack(children: Seq[Expression])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13532 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13532 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61697/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13532 **[Test build #61697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61697/consoleFull)** for PR 13532 at commit [`23263e4`](https://github.com/apache/spark/commit/23263e4940f5b6e67ee7b06b9e0fad72bbe7606f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13532: [SPARK-15204][SQL] improve nullability inference for Agg...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13532 **[Test build #61697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61697/consoleFull)** for PR 13532 at commit [`23263e4`](https://github.com/apache/spark/commit/23263e4940f5b6e67ee7b06b9e0fad72bbe7606f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14033 **[Test build #61696 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61696/consoleFull)** for PR 14033 at commit [`f02e1dd`](https://github.com/apache/spark/commit/f02e1dd0928992e530ea8d8a0663050fecdcd4ce). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13532: [SPARK-15204][SQL] improve nullability inference ...
Github user koertkuipers commented on a diff in the pull request: https://github.com/apache/spark/pull/13532#discussion_r69397207 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala --- @@ -305,4 +305,13 @@ class DatasetAggregatorSuite extends QueryTest with SharedSQLContext { val ds = Seq(1, 2, 3).toDS() checkDataset(ds.select(MapTypeBufferAgg.toColumn), 1) } + + test("spark-15204 improve nullability inference for Aggregator") { +val ds1 = Seq(1, 3, 2, 5).toDS() +assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable === false) +val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS() +assert(ds2.groupByKey(_.b).agg(SeqAgg.toColumn).schema(1).nullable === true) --- End diff -- the last assert with NameAgg tests String as output of the Aggregator. is that good enough? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org