[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13990 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13990 **[Test build #61685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61685/consoleFull)** for PR 13990 at commit [`fa294bc`](https://github.com/apache/spark/commit/fa294bc04896ed61efb847adc2612feca9cbaf16). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390423 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/OneHotEncoderSuite.scala --- @@ -29,10 +29,11 @@ import org.apache.spark.sql.types._ class OneHotEncoderSuite extends SparkFunSuite with MLlibTestSparkContext with DefaultReadWriteTest { + import testImplicits._ def stringIndexed(): DataFrame = { val data = sc.parallelize(Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c")), 2) --- End diff -- Remove `sc.parallelize`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390388 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/NormalizerSuite.scala --- @@ -61,7 +62,7 @@ class NormalizerSuite extends SparkFunSuite with MLlibTestSparkContext with Defa Vectors.sparse(3, Seq()) ) -dataFrame = spark.createDataFrame(sc.parallelize(data, 2).map(NormalizerSuite.FeatureData)) +dataFrame = sc.parallelize(data, 2).map(NormalizerSuite.FeatureData).toDF() --- End diff -- Remove `sc.parallelize` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390273 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/MinMaxScalerSuite.scala --- @@ -57,8 +58,7 @@ class MinMaxScalerSuite extends SparkFunSuite with MLlibTestSparkContext with De test("MinMaxScaler arguments max must be larger than min") { withClue("arguments max must be larger than min") { - val dummyDF = spark.createDataFrame(Seq( -(1, Vectors.dense(1.0, 2.0.toDF("id", "feature") + val dummyDF = Seq((1, Vectors.dense(1.0, 2.0))).toDF("id", "feature") --- End diff -- It's just a column name, but for consistency...`features` (not `feature`) (unless there's a reason for this) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390216 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala --- @@ -44,7 +45,7 @@ class CountVectorizerSuite extends SparkFunSuite with MLlibTestSparkContext (3, split(""), Vectors.sparse(4, Seq())), // empty string --- End diff -- Replace the comment `// empty string` with `val EMPTY_STRING = ""` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390204 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/RegressionEvaluatorSuite.scala --- @@ -42,9 +43,10 @@ class RegressionEvaluatorSuite * data.map(x=> x.label + ", " + x.features(0) + ", " + x.features(1)) * .saveAsTextFile("path") */ -val dataset = spark.createDataFrame( - sc.parallelize(LinearDataGenerator.generateLinearInput( -6.3, Array(4.7, 7.2), Array(0.9, -1.3), Array(0.7, 1.2), 100, 42, 0.1), 2).map(_.asML)) +val dataset = sc.parallelize( --- End diff -- Remove `sc.parallelize` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390189 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/RandomForestClassifierSuite.scala --- @@ -158,7 +159,7 @@ class RandomForestClassifierSuite } test("Fitting without numClasses in metadata") { -val df: DataFrame = spark.createDataFrame(TreeTests.featureImportanceData(sc)) +val df: DataFrame = TreeTests.featureImportanceData(sc).toDF() --- End diff -- Why is the type annotation needed here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390185 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala --- @@ -55,7 +56,7 @@ class OneVsRestSuite extends SparkFunSuite with MLlibTestSparkContext with Defau val xVariance = Array(0.6856, 0.1899, 3.116, 0.581) rdd = sc.parallelize(generateMultinomialLogisticInput( coefficients, xMean, xVariance, true, nPoints, 42), 2) -dataset = spark.createDataFrame(rdd) +dataset = rdd.toDF() --- End diff -- Merge it with line 57. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390178 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/NaiveBayesSuite.scala --- @@ -47,7 +48,7 @@ class NaiveBayesSuite extends SparkFunSuite with MLlibTestSparkContext with Defa Array(0.10, 0.10, 0.70, 0.10) // label 2 ).map(_.map(math.log)) -dataset = spark.createDataFrame(generateNaiveBayesInput(pi, theta, 100, 42)) +dataset = generateNaiveBayesInput(pi, theta, 100, 42).toDF() --- End diff -- Exactly my point above :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390173 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/MultilayerPerceptronClassifierSuite.scala --- @@ -116,7 +117,7 @@ class MultilayerPerceptronClassifierSuite // the input seed is somewhat magic, to make this test pass val rdd = sc.parallelize(generateMultinomialLogisticInput( coefficients, xMean, xVariance, true, nPoints, 1), 2) -val dataFrame = spark.createDataFrame(rdd).toDF("label", "features") +val dataFrame = rdd.toDF("label", "features") --- End diff -- Could we merge this line with 118? I don't think 118 needs `sc.parallelize`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390144 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -55,7 +56,7 @@ class LogisticRegressionSuite generateMultinomialLogisticInput(coefficients, xMean, xVariance, addIntercept = true, nPoints, 42) - spark.createDataFrame(sc.parallelize(testData, 4)) + sc.parallelize(testData, 4).toDF() --- End diff -- `testData.toDF.repartition(4)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390147 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala --- @@ -869,8 +870,7 @@ class LogisticRegressionSuite } } - (spark.createDataFrame(sc.parallelize(data1, 4)), -spark.createDataFrame(sc.parallelize(data2, 4))) + (sc.parallelize(data1, 4).toDF(), sc.parallelize(data2, 4).toDF()) --- End diff -- Same as above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390132 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/GBTClassifierSuite.scala --- @@ -134,15 +135,14 @@ class GBTClassifierSuite extends SparkFunSuite with MLlibTestSparkContext */ test("Fitting without numClasses in metadata") { -val df: DataFrame = spark.createDataFrame(TreeTests.featureImportanceData(sc)) +val df: DataFrame = TreeTests.featureImportanceData(sc).toDF() val gbt = new GBTClassifier().setMaxDepth(1).setMaxIter(1) gbt.fit(df) --- End diff -- Wonder why this line is separate not part of 139? Any reason? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14035: [SPARK-16356][ML] Add testImplicits for ML unit t...
Github user jaceklaskowski commented on a diff in the pull request: https://github.com/apache/spark/pull/14035#discussion_r69390117 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ClassifierSuite.scala --- @@ -71,8 +71,7 @@ class ClassifierSuite extends SparkFunSuite with MLlibTestSparkContext { test("getNumClasses") { def getTestData(labels: Seq[Double]): DataFrame = { --- End diff -- repeated. What about Moving it outside `test` methods? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61686/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #61686 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61686/consoleFull)** for PR 14036 at commit [`5db17fd`](https://github.com/apache/spark/commit/5db17fd2e05578bd152d9f2654d49aa4e07abe5b). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14031 **[Test build #61687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61687/consoleFull)** for PR 14031 at commit [`939e8b5`](https://github.com/apache/spark/commit/939e8b5d3a3b502f3a7870d437cb38ee9564e6c4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14031 Jenkins add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14031 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14031: [SPARK-16353][BUILD][DOC] Missing javadoc options for ja...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14031 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #61686 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61686/consoleFull)** for PR 14036 at commit [`5db17fd`](https://github.com/apache/spark/commit/5db17fd2e05578bd152d9f2654d49aa4e07abe5b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL] Implement str_to_map SQL function
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/13990 cc: @cloud-fan @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13990: [SPARK-16287][SQL][WIP] Implement str_to_map SQL functio...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13990 **[Test build #61685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61685/consoleFull)** for PR 13990 at commit [`fa294bc`](https://github.com/apache/spark/commit/fa294bc04896ed61efb847adc2612feca9cbaf16). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #61684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61684/consoleFull)** for PR 9207 at commit [`e6845f1`](https://github.com/apache/spark/commit/e6845f1c37481b270359a6a4291c07dcaace242e). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61684/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #61684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61684/consoleFull)** for PR 9207 at commit [`e6845f1`](https://github.com/apache/spark/commit/e6845f1c37481b270359a6a4291c07dcaace242e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61683/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #61683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61683/consoleFull)** for PR 9207 at commit [`49f8a8d`](https://github.com/apache/spark/commit/49f8a8d540860e28de8a356616f78a64ab112909). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9207 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13532: [SPARK-15204][SQL] improve nullability inference ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13532#discussion_r69388379 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala --- @@ -305,4 +305,13 @@ class DatasetAggregatorSuite extends QueryTest with SharedSQLContext { val ds = Seq(1, 2, 3).toDS() checkDataset(ds.select(MapTypeBufferAgg.toColumn), 1) } + + test("spark-15204 improve nullability inference for Aggregator") { +val ds1 = Seq(1, 3, 2, 5).toDS() +assert(ds1.select(typed.sum((i: Int) => i)).schema.head.nullable === false) +val ds2 = Seq(AggData(1, "a"), AggData(2, "a")).toDS() +assert(ds2.groupByKey(_.b).agg(SeqAgg.toColumn).schema(1).nullable === true) --- End diff -- should we use `ds.select` for all tests? And we should also test String, which is flat but nullable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13976: [SPARK-16288][SQL] Implement inline table generat...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13976#discussion_r69388270 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/generators.scala --- @@ -195,3 +195,38 @@ case class Explode(child: Expression) extends ExplodeBase(child, position = fals extended = "> SELECT _FUNC_(array(10,20));\n 0\t10\n 1\t20") // scalastyle:on line.size.limit case class PosExplode(child: Expression) extends ExplodeBase(child, position = true) + +/** + * Explodes an array of structs into a table. + */ +@ExpressionDescription( + usage = "_FUNC_(a) - Explodes an array of structs into a table.", + extended = "> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));\n [1,a]\n [2,b]") +case class Inline(child: Expression) extends UnaryExpression with Generator with CodegenFallback { + + override def children: Seq[Expression] = child :: Nil + + override def checkInputDataTypes(): TypeCheckResult = child.dataType match { +case ArrayType(et, _) if et.isInstanceOf[StructType] => + TypeCheckResult.TypeCheckSuccess +case _ => + TypeCheckResult.TypeCheckFailure( +s"input to function $prettyName should be array of struct type, not ${child.dataType}") + } + + override def elementSchema: StructType = child.dataType match { +case ArrayType(et : StructType, _) => et + } + + private lazy val numFields = elementSchema.fields.length + + override def eval(input: InternalRow): TraversableOnce[InternalRow] = { +val inputArray = child.eval(input).asInstanceOf[ArrayData] +if (inputArray == null) { + Nil +} else { + for (i <- 0 until inputArray.numElements()) +yield inputArray.getStruct(i, numFields) --- End diff -- I'm not sure how is the performance of `for-yield`, maybe it's safe to create an array manually and use while loop here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #9207: [SPARK-11171][SPARK-11237][SPARK-11241][ML] Try adding PM...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9207 **[Test build #61683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61683/consoleFull)** for PR 9207 at commit [`49f8a8d`](https://github.com/apache/spark/commit/49f8a8d540860e28de8a356616f78a64ab112909). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14035 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14035 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61681/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #61682 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61682/consoleFull)** for PR 14036 at commit [`e4e42c3`](https://github.com/apache/spark/commit/e4e42c35b0236dff6aedf6468a7d94f80bc6023b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class MapKeys(child: Expression)` * `case class MapValues(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61682/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14036 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14035 **[Test build #61681 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61681/consoleFull)** for PR 14035 at commit [`6fdf290`](https://github.com/apache/spark/commit/6fdf290d682b190f6b0457bb084cebacd105ee00). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14036 **[Test build #61682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61682/consoleFull)** for PR 14036 at commit [`e4e42c3`](https://github.com/apache/spark/commit/e4e42c35b0236dff6aedf6468a7d94f80bc6023b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69388219 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/StringExpressionsSuite.scala --- @@ -725,4 +725,51 @@ class StringExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { checkEvaluation(FindInSet(Literal("abf"), Literal("abc,b,ab,c,def")), 0) checkEvaluation(FindInSet(Literal("ab,"), Literal("abc,b,ab,c,def")), 0) } + + test("ParseUrl") { +def checkParseUrl(expected: String, urlStr: String, partToExtract: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal.create(urlStr, StringType), + Literal.create(partToExtract, StringType))), expected) +} +def checkParseUrlWithKey( +expected: String, urlStr: String, +partToExtract: String, key: String): Unit = { + checkEvaluation( +ParseUrl(Seq(Literal.create(urlStr, StringType), Literal.create(partToExtract, StringType), + Literal.create(key, StringType))), expected) +} + +checkParseUrl("spark.apache.org", "http://spark.apache.org/path?query=1;, "HOST") +checkParseUrl("/path", "http://spark.apache.org/path?query=1;, "PATH") +checkParseUrl("query=1", "http://spark.apache.org/path?query=1;, "QUERY") +checkParseUrl("Ref", "http://spark.apache.org/path?query=1#Ref;, "REF") +checkParseUrl("http", "http://spark.apache.org/path?query=1;, "PROTOCOL") +checkParseUrl("/path?query=1", "http://spark.apache.org/path?query=1;, "FILE") +checkParseUrl("spark.apache.org:8080", "http://spark.apache.org:8080/path?query=1;, "AUTHORITY") +checkParseUrl("userinfo", "http://useri...@spark.apache.org/path?query=1;, "USERINFO") +checkParseUrlWithKey("1", "http://spark.apache.org/path?query=1;, "QUERY", "query") + +// Null checking +checkParseUrl(null, null, "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, null) +checkParseUrl(null, null, null) +checkParseUrl(null, "test", "HOST") +checkParseUrl(null, "http://spark.apache.org/path?query=1;, "NO") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "HOST", "query") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "quer") +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", null) +checkParseUrlWithKey(null, "http://spark.apache.org/path?query=1;, "QUERY", "") + +// exceptional cases +intercept[java.util.regex.PatternSyntaxException] { --- End diff -- I think it's fine to throw the exception at executor side, no need to specially handle literal here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14004: [SPARK-16285][SQL] Implement sentences SQL functi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/14004#discussion_r69388199 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java --- @@ -801,6 +804,49 @@ public static UTF8String concatWs(UTF8String separator, UTF8String... inputs) { return res; } + /** + * Return a locale of the given language and country, or a default locale when failures occur. + */ + private Locale getLocale(UTF8String language, UTF8String country) { +try { + return new Locale(language.toString(), country.toString()); +} finally { + return Locale.getDefault(); +} + } + + /** + * Splits a string into arrays of sentences, where each sentence is an array of words. + */ + public ArrayListsentences(UTF8String language, UTF8String country) { --- End diff -- hmmm, I'm confused too. @rxin what's the convention here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessa...
Github user techaddict commented on the issue: https://github.com/apache/spark/pull/14036 cc: @cloud-fan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14036: [SPARK-16323] [SQL] Add IntegerDivide to avoid un...
GitHub user techaddict opened a pull request: https://github.com/apache/spark/pull/14036 [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast ## What changes were proposed in this pull request? Add IntegerDivide to avoid unnecessary cast Before: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == CAST((6 / 3) AS BIGINT): bigint Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS CAST((6 / 3) AS BIGINT)#5L] +- OneRowRelation$ ... ``` After: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == (6 / 3): int Project [(6 / 3) AS (6 / 3)#11] +- OneRowRelation$ ... ``` ## How was this patch tested? Existing Tests and added new ones You can merge this pull request into a Git repository by running: $ git pull https://github.com/techaddict/spark SPARK-16323 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/14036.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #14036 commit 067b788b05846e659615feef8613f8965573f150 Author: Sandeep SinghDate: 2016-07-03T09:00:11Z [SPARK-16323] [SQL] Add IntegerDivide to avoid unnecessary cast Before: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == CAST((6 / 3) AS BIGINT): bigint Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS CAST((6 / 3) AS BIGINT)#5L] +- OneRowRelation$ ... ``` After: ``` scala> spark.sql("select 6 div 3").explain(true) ... == Analyzed Logical Plan == (6 / 3): int Project [(6 / 3) AS (6 / 3)#11] +- OneRowRelation$ ... ``` commit e4e42c35b0236dff6aedf6468a7d94f80bc6023b Author: Sandeep Singh Date: 2016-07-03T09:00:59Z Merge branch 'master' into SPARK-16323 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13967: [SPARK-16278][SPARK-16279][SQL] Implement map_key...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13967 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13967: [SPARK-16278][SPARK-16279][SQL] Implement map_keys/map_v...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/13967 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14007: [SPARK-16329] [SQL] Star Expansion over Table Containing...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14007 and also 1.6 please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14007: [SPARK-16329] [SQL] Star Expansion over Table Containing...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14007 thanks, merging to master! @gatorsmile can you backport it to 2.0 and fix the conflicts? thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14007: [SPARK-16329] [SQL] Star Expansion over Table Con...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/14007 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61680/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14004 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14004 **[Test build #61680 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61680/consoleFull)** for PR 14004 at commit [`ea75373`](https://github.com/apache/spark/commit/ea753738b153d15cc5c75eea88a8b86ad79d1d7b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14008: [SPARK-16281][SQL] Implement parse_url SQL function
Github user janplus commented on the issue: https://github.com/apache/spark/pull/14008 @dongjoon-hyun Thank you for review I have add a new commit which does following things: 1. Implement **Literal** `key` validation in `Driver` side. 2. Correct some failure message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13704 @cloud-fan , regarding a ```cast```, I think that you are correct. When we insert a ```cast```, I pass information on ```ArrayType.containsNull``` to ```Cast``` now. In this case, it is ```false```. As a result, after the logical optimizations, we got a plan tree with ```Cast```. I updated the description of PR. ``` == Analyzed Logical Plan == value: double SerializeFromObject [input[0, double, true] AS value#6] +- MapElements , obj#5: double +- DeserializeToObject cast(value#1 as array).toDoubleArray, obj#4: [D +- LocalRelation [value#1] == Optimized Logical Plan == SerializeFromObject [input[0, double, true] AS value#6] +- MapElements , obj#5: double +- DeserializeToObject value#1.toDoubleArray, obj#4: [D +- LocalRelation [value#1] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14034 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61678/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14034 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14034: [16355] [16354] [SQL] Fix Bugs When LIMIT/TABLESAMPLE is...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14034 **[Test build #61678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61678/consoleFull)** for PR 14034 at commit [`bdf4e56`](https://github.com/apache/spark/commit/bdf4e56f3478bd99d1e92d338a984dba869363dc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14035 **[Test build #61681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61681/consoleFull)** for PR 14035 at commit [`6fdf290`](https://github.com/apache/spark/commit/6fdf290d682b190f6b0457bb084cebacd105ee00). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386958 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- Thanks, this is my approach. It will give following message > org.apache.spark.sql.AnalysisException: cannot resolve 'parse_url("http://spark.apache.org/path?;, "QUERY", "???")' due to data type mismatch: invalid key '???'; line 1 pos 7 Does it look ok to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- Then, you can try that in `checkInputDataTypes`. Please refer the following code. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala#L544 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13704 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61677/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13704 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13704 **[Test build #61677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61677/consoleFull)** for PR 13704 at commit [`77859cf`](https://github.com/apache/spark/commit/77859cf4397b8a5022b93ffa4996203b36dfef1b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- Oh, sorry for that. I missed that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13976 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13976 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61676/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13976: [SPARK-16288][SQL] Implement inline table generating fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13976 **[Test build #61676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61676/consoleFull)** for PR 13976 at commit [`e260359`](https://github.com/apache/spark/commit/e26035968c73210dda38e82654fc335390fc6c1e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14032: [Minor][SQL] Replace Parquet deprecations
Github user techaddict closed the pull request at: https://github.com/apache/spark/pull/14032 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14033 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61675/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14033 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14033 **[Test build #61675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61675/consoleFull)** for PR 14033 at commit [`6de93a1`](https://github.com/apache/spark/commit/6de93a1582ac5877a932ea47e86811e228b5c2f6). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Stack(children: Seq[Expression])` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386603 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,298 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.SparkConf +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.unsafe.Platform +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + new SparkConf() +.setMaster("local[1]") +.setAppName("microbenchmark") +.set("spark.driver.memory", "3g") + + def calculateHeaderPortionInBytes(count: Int) : Int = { +// Use this assignment for SPARK-15962 +// val size = 4 + 4 * count +val size = UnsafeArrayData.calculateHeaderPortionInBytes(count) +size + } + + def readUnsafeArray(iters: Int): Unit = { +val count = 1024 * 1024 * 16 + +val intUnsafeArray = new UnsafeArrayData +var intResult: Int = 0 +val intSize = calculateHeaderPortionInBytes(count) + 4 * count +val intBuffer = new Array[Byte](intSize) +Platform.putInt(intBuffer, Platform.BYTE_ARRAY_OFFSET, count) +intUnsafeArray.pointTo(intBuffer, Platform.BYTE_ARRAY_OFFSET, intSize) +val readIntArray = { i: Int => + var n = 0 + while (n < iters) { +val len = intUnsafeArray.numElements +var sum = 0.toInt +var i = 0 +while (i < len) { + sum += intUnsafeArray.getInt(i) + i += 1 +} +intResult = sum +n += 1 + } +} + +val doubleUnsafeArray = new UnsafeArrayData +var doubleResult: Double = 0 +val doubleSize = calculateHeaderPortionInBytes(count) + 8 * count +val doubleBuffer = new Array[Byte](doubleSize) +Platform.putInt(doubleBuffer, Platform.BYTE_ARRAY_OFFSET, count) +doubleUnsafeArray.pointTo(doubleBuffer, Platform.BYTE_ARRAY_OFFSET, doubleSize) +val readDoubleArray = { i: Int => + var n = 0 + while (n < iters) { +val len = doubleUnsafeArray.numElements +var sum = 0.toDouble +var i = 0 +while (i < len) { + sum += doubleUnsafeArray.getDouble(i) + i += 1 +} +doubleResult = sum +n += 1 + } +} + +val benchmark = new Benchmark("Read UnsafeArrayData", count * iters) +benchmark.addCase("Int")(readIntArray) +benchmark.addCase("Double")(readDoubleArray) +benchmark.run +/* +Without SPARK-15962 +OpenJDK 64-Bit Server VM 1.8.0_91-b14 on Linux 4.0.4-301.fc22.x86_64 +Intel Xeon E3-12xx v2 (Ivy Bridge) +Read UnsafeArrayData:Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative + +Int370 / 471454.0 2.2 1.0X +Double 351 / 466477.5 2.1 1.1X +*/ +/* +With SPARK-15962 --- End diff -- Sure, removed the result without this PR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386599 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -25,30 +25,36 @@ import org.apache.spark.sql.types.*; import org.apache.spark.unsafe.Platform; import org.apache.spark.unsafe.array.ByteArrayMethods; +import org.apache.spark.unsafe.bitset.BitSetMethods; import org.apache.spark.unsafe.hash.Murmur3_x86_32; import org.apache.spark.unsafe.types.CalendarInterval; import org.apache.spark.unsafe.types.UTF8String; /** * An Unsafe implementation of Array which is backed by raw memory instead of Java objects. * - * Each tuple has three parts: [numElements] [offsets] [values] + * Each tuple has four parts: [numElements][null bits][values or offset][variable length portion] * * The `numElements` is 4 bytes storing the number of elements of this array. * - * In the `offsets` region, we store 4 bytes per element, represents the relative offset (w.r.t. the - * base address of the array) of this element in `values` region. We can get the length of this - * element by subtracting next offset. - * Note that offset can by negative which means this element is null. + * In the `null bits` region, we store 1 bit per element, represents whether a element has null + * Its total size is ceil(numElements / 8) bytes, and it is aligned to 8-byte word boundaries. * - * In the `values` region, we store the content of elements. As we can get length info, so elements - * can be variable-length. + * In the `values or offset` region, we store the content of elements. For fields that hold + * fixed-length primitive types, such as long, double, or int, we store the value directly + * in the field. For fields with non-primitive or variable-length values, we store a relative + * offset (w.r.t. the base address of the row) that points to the beginning of the variable-length --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14035 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61679/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14035 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386595 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -317,7 +301,6 @@ public boolean equals(Object other) { } return false; } - --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386593 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java --- @@ -341,63 +324,113 @@ public UnsafeArrayData copy() { return arrayCopy; } - public static UnsafeArrayData fromPrimitiveArray(int[] arr) { -if (arr.length > (Integer.MAX_VALUE - 4) / 8) { - throw new UnsupportedOperationException("Cannot convert this array to unsafe format as " + -"it's too big."); -} + @Override + public boolean[] toBooleanArray() { +int size = numElements(); +boolean[] values = new boolean[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.BYTE_ARRAY_OFFSET, size); +return values; + } + + @Override + public byte[] toByteArray() { +int size = numElements(); +byte[] values = new byte[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.BYTE_ARRAY_OFFSET, size); +return values; + } + + @Override + public short[] toShortArray() { +int size = numElements(); +short[] values = new short[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.SHORT_ARRAY_OFFSET, size * 2); +return values; + } -final int offsetRegionSize = 4 * arr.length; -final int valueRegionSize = 4 * arr.length; -final int totalSize = 4 + offsetRegionSize + valueRegionSize; -final byte[] data = new byte[totalSize]; + @Override + public int[] toIntArray() { +int size = numElements(); +int[] values = new int[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.INT_ARRAY_OFFSET, size * 4); +return values; + } -Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, arr.length); + @Override + public long[] toLongArray() { +int size = numElements(); +long[] values = new long[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.LONG_ARRAY_OFFSET, size * 8); +return values; + } -int offsetPosition = Platform.BYTE_ARRAY_OFFSET + 4; -int valueOffset = 4 + offsetRegionSize; -for (int i = 0; i < arr.length; i++) { - Platform.putInt(data, offsetPosition, valueOffset); - offsetPosition += 4; - valueOffset += 4; + @Override + public float[] toFloatArray() { +int size = numElements(); +float[] values = new float[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.FLOAT_ARRAY_OFFSET, size * 4); +return values; + } + + @Override + public double[] toDoubleArray() { +int size = numElements(); +double[] values = new double[size]; +Platform.copyMemory( + baseObject, baseOffset + headerInBytes, values, Platform.DOUBLE_ARRAY_OFFSET, size * 8); +return values; + } + + private static UnsafeArrayData fromPrimitiveArray(Object arr, int length, final int elementSize) { +final int headerSize = calculateHeaderPortionInBytes(length); +if (length > (Integer.MAX_VALUE - headerSize) / elementSize) { + throw new UnsupportedOperationException("Cannot convert this array to unsafe format as " + +"it's too big."); } +final int valueRegionSize = elementSize * length; +final byte[] data = new byte[valueRegionSize + headerSize]; + +Platform.putInt(data, Platform.BYTE_ARRAY_OFFSET, length); Platform.copyMemory(arr, Platform.INT_ARRAY_OFFSET, data, --- End diff -- Definitely yes, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14035 **[Test build #61679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61679/consoleFull)** for PR 14035 at commit [`54c27d4`](https://github.com/apache/spark/commit/54c27d4d359a7e6ad445856e06f15e29132d582c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386582 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -33,91 +38,147 @@ // The offset of the global buffer where we start to write this array. private int startingOffset; + // The number of elements in this array + private int numElements; + + private int headerInBytes; + + private void assertIndexIsValid(int index) { +assert index >= 0 : "index (" + index + ") should >= 0"; +assert index < numElements : "index (" + index + ") should < " + numElements; + } + public void initialize(BufferHolder holder, int numElements, int fixedElementSize) { -// We need 4 bytes to store numElements and 4 bytes each element to store offset. -final int fixedSize = 4 + 4 * numElements; +this.numElements = numElements; +this.headerInBytes = calculateHeaderPortionInBytes(numElements); this.holder = holder; this.startingOffset = holder.cursor; -holder.grow(fixedSize); -Platform.putInt(holder.buffer, holder.cursor, numElements); -holder.cursor += fixedSize; +// Grows the global buffer ahead for header and fixed size data. +holder.grow(headerInBytes + fixedElementSize * numElements); + +// Initialize information in header +Platform.putInt(holder.buffer, startingOffset, numElements); +Arrays.fill(holder.buffer, startingOffset + 4 - Platform.BYTE_ARRAY_OFFSET, + startingOffset + headerInBytes - Platform.BYTE_ARRAY_OFFSET, (byte)0); -// Grows the global buffer ahead for fixed size data. -holder.grow(fixedElementSize * numElements); +holder.cursor += (headerInBytes + fixedElementSize * numElements); } - private long getElementOffset(int ordinal) { -return startingOffset + 4 + 4 * ordinal; + private long getElementOffset(int ordinal, int scale) { +return startingOffset + headerInBytes + ordinal * scale; + } + + public void setOffsetAndSize(int ordinal, long currentCursor, long size) { +final long relativeOffset = currentCursor - startingOffset; +final long offsetAndSize = (relativeOffset << 32) | size; + +write(ordinal, offsetAndSize); } public void setNullAt(int ordinal) { -final int relativeOffset = holder.cursor - startingOffset; -// Writes negative offset value to represent null element. -Platform.putInt(holder.buffer, getElementOffset(ordinal), -relativeOffset); +throw new UnsupportedOperationException("setNullAt() is not supported"); --- End diff -- Removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386586 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -33,91 +38,147 @@ // The offset of the global buffer where we start to write this array. private int startingOffset; + // The number of elements in this array + private int numElements; + + private int headerInBytes; + + private void assertIndexIsValid(int index) { +assert index >= 0 : "index (" + index + ") should >= 0"; +assert index < numElements : "index (" + index + ") should < " + numElements; + } + public void initialize(BufferHolder holder, int numElements, int fixedElementSize) { -// We need 4 bytes to store numElements and 4 bytes each element to store offset. -final int fixedSize = 4 + 4 * numElements; +this.numElements = numElements; +this.headerInBytes = calculateHeaderPortionInBytes(numElements); this.holder = holder; this.startingOffset = holder.cursor; -holder.grow(fixedSize); -Platform.putInt(holder.buffer, holder.cursor, numElements); -holder.cursor += fixedSize; +// Grows the global buffer ahead for header and fixed size data. +holder.grow(headerInBytes + fixedElementSize * numElements); + +// Initialize information in header +Platform.putInt(holder.buffer, startingOffset, numElements); +Arrays.fill(holder.buffer, startingOffset + 4 - Platform.BYTE_ARRAY_OFFSET, + startingOffset + headerInBytes - Platform.BYTE_ARRAY_OFFSET, (byte)0); -// Grows the global buffer ahead for fixed size data. -holder.grow(fixedElementSize * numElements); +holder.cursor += (headerInBytes + fixedElementSize * numElements); } - private long getElementOffset(int ordinal) { -return startingOffset + 4 + 4 * ordinal; + private long getElementOffset(int ordinal, int scale) { --- End diff -- Sure, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386564 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- I get your point. Since implement here will still be `Executor` side, I'll try to find another way to do this. Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386574 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -33,91 +38,147 @@ // The offset of the global buffer where we start to write this array. private int startingOffset; + // The number of elements in this array + private int numElements; + + private int headerInBytes; + + private void assertIndexIsValid(int index) { +assert index >= 0 : "index (" + index + ") should >= 0"; +assert index < numElements : "index (" + index + ") should < " + numElements; + } + public void initialize(BufferHolder holder, int numElements, int fixedElementSize) { -// We need 4 bytes to store numElements and 4 bytes each element to store offset. -final int fixedSize = 4 + 4 * numElements; +this.numElements = numElements; +this.headerInBytes = calculateHeaderPortionInBytes(numElements); this.holder = holder; this.startingOffset = holder.cursor; -holder.grow(fixedSize); -Platform.putInt(holder.buffer, holder.cursor, numElements); -holder.cursor += fixedSize; +// Grows the global buffer ahead for header and fixed size data. +holder.grow(headerInBytes + fixedElementSize * numElements); + +// Initialize information in header +Platform.putInt(holder.buffer, startingOffset, numElements); +Arrays.fill(holder.buffer, startingOffset + 4 - Platform.BYTE_ARRAY_OFFSET, --- End diff -- Replaced this with ```Platform.putlong``` as ```zeroOutNullBytes``` does. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386561 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -19,9 +19,14 @@ import org.apache.spark.sql.types.Decimal; import org.apache.spark.unsafe.Platform; +import org.apache.spark.unsafe.bitset.BitSetMethods; import org.apache.spark.unsafe.types.CalendarInterval; import org.apache.spark.unsafe.types.UTF8String; +import java.util.Arrays; --- End diff -- Removed this import --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386554 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -192,26 +192,30 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro val fixedElementSize = et match { --- End diff -- Updated the name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386552 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala --- @@ -192,26 +192,30 @@ object GenerateUnsafeProjection extends CodeGenerator[Seq[Expression], UnsafePro val fixedElementSize = et match { case t: DecimalType if t.precision <= Decimal.MAX_LONG_DIGITS => 8 case _ if ctx.isPrimitiveType(jt) => et.defaultSize - case _ => 0 + case _ => 8 --- End diff -- Added a comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386548 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,298 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.SparkConf +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.unsafe.Platform +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + new SparkConf() +.setMaster("local[1]") +.setAppName("microbenchmark") +.set("spark.driver.memory", "3g") --- End diff -- removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386546 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -126,11 +187,11 @@ public void write(int ordinal, Decimal input, int precision, int scale) { // Write the bytes to the variable length portion. Platform.copyMemory( bytes, Platform.BYTE_ARRAY_OFFSET, holder.buffer, holder.cursor, bytes.length); -setOffset(ordinal); +write(ordinal, ((long)(holder.cursor - startingOffset) << 32) | ((long) bytes.length)); --- End diff -- yes, done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386542 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -33,91 +38,147 @@ // The offset of the global buffer where we start to write this array. private int startingOffset; + // The number of elements in this array + private int numElements; + + private int headerInBytes; + + private void assertIndexIsValid(int index) { +assert index >= 0 : "index (" + index + ") should >= 0"; +assert index < numElements : "index (" + index + ") should < " + numElements; + } + public void initialize(BufferHolder holder, int numElements, int fixedElementSize) { -// We need 4 bytes to store numElements and 4 bytes each element to store offset. --- End diff -- Added a comment regarding 4 bytes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce implementation with ...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13680#discussion_r69386534 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/UnsafeArrayDataBenchmark.scala --- @@ -0,0 +1,298 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.benchmark + +import org.apache.spark.SparkConf +import org.apache.spark.sql.catalyst.expressions.{UnsafeArrayData, UnsafeRow} +import org.apache.spark.sql.catalyst.expressions.codegen.{BufferHolder, UnsafeArrayWriter} +import org.apache.spark.unsafe.Platform +import org.apache.spark.util.Benchmark + +/** + * Benchmark [[UnsafeArrayDataBenchmark]] for UnsafeArrayData + * To run this: + * build/sbt "sql/test-only *benchmark.UnsafeArrayDataBenchmark" + * + * Benchmarks in this file are skipped in normal builds. + */ +class UnsafeArrayDataBenchmark extends BenchmarkBase { + + new SparkConf() +.setMaster("local[1]") +.setAppName("microbenchmark") +.set("spark.driver.memory", "3g") --- End diff -- Removed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- It's not general pattern matching problem. You know, it's the 'key=value` part of the URL, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386305 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure("parse_url function requires two or three arguments") --- End diff -- OK, thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user janplus commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386280 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- BTW, implement here seems that the invalidation still happened in `Executor` side. IMO, the invalidation of key's value can hardly be considered as `AnalysisException`, just like the `rlike` function, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) +case _ => null + } + + private lazy val stringExprs = children.toArray + import ParseUrl._ + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.size > 3 || children.size < 2) { + TypeCheckResult.TypeCheckFailure("parse_url function requires two or three arguments") --- End diff -- And, here, let's use `$prettyName` instead of `parse_url`. I mean ```scala TypeCheckResult.TypeCheckFailure(s"$prettyName function requires two or three arguments") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14008: [SPARK-16281][SQL] Implement parse_url SQL functi...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/14008#discussion_r69386214 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala --- @@ -652,6 +656,163 @@ case class StringRPad(str: Expression, len: Expression, pad: Expression) override def prettyName: String = "rpad" } +object ParseUrl { + private val HOST = UTF8String.fromString("HOST") + private val PATH = UTF8String.fromString("PATH") + private val QUERY = UTF8String.fromString("QUERY") + private val REF = UTF8String.fromString("REF") + private val PROTOCOL = UTF8String.fromString("PROTOCOL") + private val FILE = UTF8String.fromString("FILE") + private val AUTHORITY = UTF8String.fromString("AUTHORITY") + private val USERINFO = UTF8String.fromString("USERINFO") + private val REGEXPREFIX = "(&|^)" + private val REGEXSUBFIX = "=([^&]*)" +} + +/** + * Extracts a part from a URL + */ +@ExpressionDescription( + usage = "_FUNC_(url, partToExtract[, key]) - extracts a part from a URL", + extended = """Parts: HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, USERINFO. +Key specifies which query to extract. +Examples: + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'HOST') + 'spark.apache.org' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY') + 'query=1' + > SELECT _FUNC_('http://spark.apache.org/path?query=1', 'QUERY', 'query') + '1'""") +case class ParseUrl(children: Seq[Expression]) + extends Expression with ImplicitCastInputTypes with CodegenFallback { + + override def nullable: Boolean = true + override def inputTypes: Seq[DataType] = Seq.fill(children.size)(StringType) + override def dataType: DataType = StringType + override def prettyName: String = "parse_url" + + // If the url is a constant, cache the URL object so that we don't need to convert url + // from UTF8String to String to URL for every row. + @transient private lazy val cachedUrl = stringExprs(0) match { +case Literal(url: UTF8String, _) => getUrl(url) +case _ => null + } + + // If the key is a constant, cache the Pattern object so that we don't need to convert key + // from UTF8String to String to StringBuilder to String to Pattern for every row. + @transient private lazy val cachedPattern = stringExprs(2) match { +case Literal(key: UTF8String, _) => getPattern(key) --- End diff -- Yep. I think so. 1. `PatternSyntaxException` is a result of `Executor` side failure. It means it could be any exception. 2. `AnalysisException` is a result of `Driver`-side static analysis. We need to do our best to prevent the error of type 1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13680 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13680 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/61673/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13680 **[Test build #61673 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61673/consoleFull)** for PR 13680 at commit [`243252a`](https://github.com/apache/spark/commit/243252a460794c2b6e2dff3757e421e2532e87bf). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14004: [SPARK-16285][SQL] Implement sentences SQL functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14004 **[Test build #61680 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/61680/consoleFull)** for PR 14004 at commit [`ea75373`](https://github.com/apache/spark/commit/ea753738b153d15cc5c75eea88a8b86ad79d1d7b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org