[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221181813 **[Test build #59187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59187/consoleFull)** for PR 9192 at commit [`a9479dd`](https://github.com/apache/spark/commit/a9479dd3ea1f8db84ec7dd26989a0476a39419ec). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221181147 It seems that many ppl voted for point 2. So, I'll implement 2. 2. function(key, df), where key is a list of grouping column values for this group, df is the data.frame of the group, containing the grouping columns. This is similar to the scala function signature for KeyValueGroupedData.flatMapGroups(). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on a diff in the pull request: https://github.com/apache/spark/pull/12836#discussion_r64335943 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -21,10 +21,12 @@ import scala.collection.JavaConverters._ import org.apache.spark.annotation.Experimental import org.apache.spark.api.java.function._ +import org.apache.spark.sql.catalyst.analysis.UnresolvedDeserializer --- End diff -- yes, will do that, thnx --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64335944 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 } - Utils.randomizeInPlace(samples, rand).take(num) + Utils.randomizeInPlace(samples.collect(), rand).take(num) --- End diff -- Anyway, thank you for review, @andrewor14 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221180766 Ok, I ses, thnkx, @sun-rui --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64335756 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 } - Utils.randomizeInPlace(samples, rand).take(num) + Utils.randomizeInPlace(samples.collect(), rand).take(num) --- End diff -- If the situation happens, it will reduce more traffics, too. But, the situation happens very rarely. Should I close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221180426 We can also add an API later, supporting partial aggregation and final aggregation together, as we have done in RDD API. Refer to "aggregateRDD". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64335663 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 } - Utils.randomizeInPlace(samples, rand).take(num) + Utils.randomizeInPlace(samples.collect(), rand).take(num) --- End diff -- Ur, right. It'll take one more pass in all cases. Hmm, maybe, that might be the main reason not to do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-221180144 **[Test build #59186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59186/consoleFull)** for PR 13177 at commit [`156fea0`](https://github.com/apache/spark/commit/156fea0db2856c4eda3ff7496218e1c7d2082c4a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r64335588 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case e: Throwable if isCausedBy(e, s"$name does not exist") => --- End diff -- @andrewor14 thanks. Changed to NonFatal. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64335520 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 --- End diff -- Yeah you are right about that. Let's not change it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221179828 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59184/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221179827 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221179691 @thunterdb, @NarineK, definitely we can add API like aggregate() later based on the functionalities of two basic API. I can submit a JIRA issue for it later. We can allow passing a user defined function as FUN. We could support FUN as builit-in functions ('mean', 'sum', etc...) by internally created an R function wrapping it, but it seems not worth, As SparkDataFrame can provide such common aggregation functions, which run on JVM, having better performance than R worker. However, if any built-in functions in R has no parity in Spark Core, we can consider support it in SparkR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221179741 **[Test build #59184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59184/consoleFull)** for PR 12889 at commit [`020c096`](https://github.com/apache/spark/commit/020c0960ec9a379de4b7209151809f83fed1bf76). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RandomForestRegressionModel(TreeEnsembleModels, JavaPredictionModel, JavaMLWritable,` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64334880 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 } - Utils.randomizeInPlace(samples, rand).take(num) + Utils.randomizeInPlace(samples.collect(), rand).take(num) --- End diff -- won't this cause another job to be run? It's more memory efficient but it takes 1 more pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-221178646 **[Test build #59185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59185/consoleFull)** for PR 13177 at commit [`9f3b7db`](https://github.com/apache/spark/commit/9f3b7db2265ff2c89dc70feda8cd3e11f94f738e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/13177#issuecomment-221178563 @wangyang1992 thanks for the notebook. I am surprised it works actually! That said I do prefer your latest solution, which is more explicit and easier to understand. Once you address the last comment I'll go ahead and merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r64334580 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case e: Throwable if isCausedBy(e, s"$name does not exist") => --- End diff -- yes, but would you mind doing `case NonFatal(e) =>` here? It's generally a bad practice to catch throwables --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13186 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-221178042 Great, I'm merging this into master 2.0 thanks for fixing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-221177993 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59182/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64334393 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 --- End diff -- Yes. I agree with you on the pros and cons of the use of exception for this. We can do that, but I hope to avoid that if possible since it'll change the public API signature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...
Github user wangyang1992 commented on a diff in the pull request: https://github.com/apache/spark/pull/13177#discussion_r64334408 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 { try { Option(hive.getFunction(db, name)).map(fromHiveFunction) } catch { - case CausedBy(ex: NoSuchObjectException) if ex.getMessage.contains(name) => + case e: Throwable if isCausedBy(e, s"$name does not exist") => --- End diff -- @andrewor14 will this work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-221177992 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-221177827 **[Test build #59182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59182/consoleFull)** for PR 13186 at commit [`4a20bad`](https://github.com/apache/spark/commit/4a20badceafbd30790575bda9841959d6c7a0c2f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14557][SQL] Reading textfile (created t...
Github user kasjain commented on the pull request: https://github.com/apache/spark/pull/12356#issuecomment-221177377 Sure. Let me add the CTAS query in the test suite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13247#issuecomment-221177271 Can we just add a flag to SQLConf to indicate whether SparkSession has been properly initialized? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64333811 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -49,6 +49,24 @@ object CodeFormatter { } code.result() } + + def stripOverlappingComments(codeAndComment: CodeAndComment): CodeAndComment = { +val code = new StringBuilder +val map = codeAndComment.comment +var lastLine: String = "dummy" +codeAndComment.body.split('\n').foreach { l => + val line = l.trim() + val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") && +line.startsWith("/*") && line.endsWith("*/") && +map(lastLine).substring(3).contains(map(line).substring(3)) --- End diff -- I think it's okay for the performance. - This function is used for at every `CodeAndComment` creation once. - It scans `codeAndComment.body` once. - Map lookup occurs on each line at most once. Also, it does not cost much in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...
Github user jurriaan commented on the pull request: https://github.com/apache/spark/pull/13267#issuecomment-221176753 @HyukjinKwon Addressed your comments and improved the documentation a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...
Github user jurriaan commented on the pull request: https://github.com/apache/spark/pull/13267#issuecomment-221176282 @HyukjinKwon If you don't supply those options they are set to the defaults. For the workings of the setQuoteEscapingEnabled see https://github.com/uniVocity/univocity-parsers/issues/38. In the test I supplied them to show a possible usecase (Redshift's CSV dialect). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13212#discussion_r64333199 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for Spark-11624\";"))( "" -> "This is a test for Spark-11624") } + + test("list jars") { +val jarFile = Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar") +runCliWithin(2.minute)( + s"ADD JAR $jarFile" -> "", + s"LIST JARS" -> "TestUDTF.jar", + s"List JAR $jarFile" -> "TestUDTF.jar" +) + } + + test("list files") { +val dataFilePath = Thread.currentThread().getContextClassLoader + .getResource("data/files/small_kv.txt") +runCliWithin(2.minute)( + s"ADD FILE $dataFilePath" -> "", + s"LIST FILES" -> "small_kv.txt", + s"LIST FILE $dataFilePath" -> "small_kv.txt" +) + } --- End diff -- @yhuai Should I remove the failing test case to allow the merge build test going while I continue investigate the root cause? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221174436 **[Test build #59184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59184/consoleFull)** for PR 12889 at commit [`020c096`](https://github.com/apache/spark/commit/020c0960ec9a379de4b7209151809f83fed1bf76). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64332536 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -49,6 +49,24 @@ object CodeFormatter { } code.result() } + + def stripOverlappingComments(codeAndComment: CodeAndComment): CodeAndComment = { +val code = new StringBuilder +val map = codeAndComment.comment +var lastLine: String = "dummy" +codeAndComment.body.split('\n').foreach { l => + val line = l.trim() + val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") && +line.startsWith("/*") && line.endsWith("*/") && +map(lastLine).substring(3).contains(map(line).substring(3)) --- End diff -- Oh, it should work, I missed the `map`. Will it have performance issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64332407 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -49,6 +49,24 @@ object CodeFormatter { } code.result() } + + def stripOverlappingComments(codeAndComment: CodeAndComment): CodeAndComment = { +val code = new StringBuilder +val map = codeAndComment.comment +var lastLine: String = "dummy" +codeAndComment.body.split('\n').foreach { l => + val line = l.trim() + val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") && +line.startsWith("/*") && line.endsWith("*/") && +map(lastLine).substring(3).contains(map(line).substring(3)) --- End diff -- Have you check that this actually work? I think we have placeholders here so will not find any duplicated comments to skip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13247#issuecomment-221173676 Initially, I did try to do it in that way. The questions bothering me are how to know which changes are made at runtime? which changes are from external users? After reading the code base, based on my understanding, we will not externalize `SQLConf` after introducing `RumtimeConfig`. The `set` APIs of `SQLConf` will be for internal usage only? Thus, we can do whatever we want, if necessary. For example, to verify the internal behaviors, our test suites are still allowed to change the configuration at runtime. For example, in multiple test cases, like [ddlsuite](https://github.com/apache/spark/blob/5afd927a47aa7ede3039234f2f7262e2247aa2ae/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala#L129), we change `spark.sql.warehouse.dir` at runtime. Then, I am wondering if we just need to block the changes on the static configuration properties through `RuntimeConfig`? That is the reason why this PR is to remove the `conf` from `SQLContext`. Feel free to let me know which ways are preferred. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/13260#discussion_r64331578 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag]( } else { val fraction = SamplingUtils.computeFractionForSampleSize(num, initialCount, withReplacement) - var samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() + var samples = this.sample(withReplacement, fraction, rand.nextInt()) + var count = samples.count() // If the first sample didn't turn out large enough, keep trying to take samples; // this shouldn't happen often because we use a big multiplier for the initial size var numIters = 0 - while (samples.length < num) { + while (count < num) { logWarning(s"Needed to re-sample due to insufficient sample size. Repeat #$numIters") -samples = this.sample(withReplacement, fraction, rand.nextInt()).collect() +samples = this.sample(withReplacement, fraction, rand.nextInt()) +count = samples.count() numIters += 1 --- End diff -- We could throw an exception after x iterations. It will be a bit of a pain to test though. I don't feel strongly about this, but it seems like a potential source of problems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13243#issuecomment-221172065 Can we just use DefinedByConstructorParams rather than using case classes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13273#discussion_r64331268 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -27,12 +25,12 @@ import org.apache.commons.lang3.StringUtils */ object CodeFormatter { def format(code: CodeAndComment): String = { -new CodeFormatter().addLines( - StringUtils.replaceEach( -code.body, -code.comment.keys.toArray, -code.comment.values.toArray) -).result +val formatter = new CodeFormatter +code.body.split("\n").foreach { line => + val trimmed = line.trim + formatter.addLine(code.comment.getOrElse(trimmed, trimmed)) +} +formatter.result() --- End diff -- How slow is it if we use an regexp to match the placeholder here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15475][SQL] Add tests for writing and r...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13253#issuecomment-221171864 Did we ever end up fixing https://issues.apache.org/jira/browse/SPARK-10216 after it was reverted? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15472][SQL][Streaming] Add partitioned ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13258#issuecomment-221171715 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221171754 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221171737 **[Test build #59183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59183/consoleFull)** for PR 9192 at commit [`3a2e0c7`](https://github.com/apache/spark/commit/3a2e0c7919b9fdbd5558cda474368c25208856b0). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221171755 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59183/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15472][SQL][Streaming] Add partitioned ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13258#issuecomment-221171716 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59176/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15472][SQL][Streaming] Add partitioned ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13258#issuecomment-221171564 **[Test build #59176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59176/consoleFull)** for PR 13258 at commit [`936bf26`](https://github.com/apache/spark/commit/936bf26415ae4f8875b091bc1587409620a14e0a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15475][SQL] Add tests for writing and r...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/13253#issuecomment-221171371 Hi @rxin and @marmbrus, As you already know a "critical" issue was found here, [SPARK-15393](https://issues.apache.org/jira/browse/SPARK-15393). So, [SPARK-10216](https://issues.apache.org/jira/browse/SPARK-10216) was reverted. It seems writing and reading empty data back were not tested across data sources. This PR includes the test which resembles the one provided in the JIRA ticket. Could you please take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13267#issuecomment-221171295 **[Test build #3011 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3011/consoleFull)** for PR 13267 at commit [`caf8808`](https://github.com/apache/spark/commit/caf8808c78cd3b6feedc34ebbf02a05a6d194034). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Glrm
Github user sushmitkarar closed the pull request at: https://github.com/apache/spark/pull/13274 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Glrm
GitHub user sushmitkarar opened a pull request: https://github.com/apache/spark/pull/13274 Glrm ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/rezazadeh/spark glrm Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13274.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13274 commit c7679f91bbf79bcefeb8c9f7ee968aac1f39b503 Author: Reza Zadeh Date: 2014-08-27T07:17:32Z First version of SparkGLRM commit 1347655961e047488bcb7ceb753c16bb1c2d7e4a Author: Reza Zadeh Date: 2014-08-27T07:19:02Z Documentation commit 16ae855c6664c276a0b2ef5fbf3c625251c9a82c Author: Reza Zadeh Date: 2014-09-07T01:20:54Z index bounds commit aa24830dc22a1e95af6fea0282d31255fd335036 Author: Reza Zadeh Date: 2014-09-07T01:30:39Z More data commit ee6cd5328458bd83d16f2f2e43a64fdac0b090f8 Author: Reza Zadeh Date: 2014-09-07T01:34:33Z Bigger dataset commit be9a51b1cc77a8a546b8150dcd498cfaecb5f703 Author: Reza Zadeh Date: 2014-09-07T18:20:27Z Larger data commit 99971db070d6923ca55148a1fcc9dc55ff068472 Author: Reza Zadeh Date: 2014-09-10T00:01:06Z Better random entry generation commit 576d9ae365589d7e67cb697e6e7edbf7c70f1f0c Author: Reza Zadeh Date: 2014-09-10T00:01:27Z Better parameters commit 1e5afe8212257fa4d05cea06665979ff9b3a9cc7 Author: Reza Zadeh Date: 2014-09-10T00:02:35Z Better parameters commit 04f48097a19de2857f49f162013fc22e217ab4eb Author: Reza Zadeh Date: 2014-09-10T18:36:11Z Proper display of status commit 7489302795e0787a70b885090603380d06d3f7a6 Author: Reza Zadeh Date: 2014-09-20T06:33:14Z chunking commit 136d0310e5b5d2cb3341ea847b0a8fb989c21f77 Author: Reza Zadeh Date: 2014-09-21T06:34:02Z Pluggable proxes commit 49c9ca72599a26d3ff91ce97739d9eec5bc24d8b Author: Reza Zadeh Date: 2014-09-21T06:51:13Z Documentation commit d8f07b4c66dce1fa0c7c3be4bfb978d62f63702b Author: Reza Zadeh Date: 2014-09-22T05:15:55Z add documentation commit 0e62894e10682e92c1d44375e3567697cf1c0056 Author: Reza Zadeh Date: 2014-09-22T05:18:40Z better spacing commit d70cfe659a95c792cb234df05ed24fdcddcf44ad Author: Reza Zadeh Date: 2014-09-25T22:19:17Z Better parameters commit 2dae5b616604182b980978f5fb444d20f169b5eb Author: Reza Zadeh Date: 2014-09-26T07:54:35Z Better loss grads pluggability commit 8c9e977bac6f66dec6c4f3b1e55065807e75eb1b Author: Reza Zadeh Date: 2014-09-26T18:04:58Z parmaeter changes commit 5951d30c0aab9668be741d367ec7c0d57824a3d3 Author: Reza Zadeh Date: 2014-09-26T23:21:29Z better stepsizes and library of proxes commit 2c3f75b30b00a6d6363e08c584017564b8c33a51 Author: Reza Zadeh Date: 2014-09-26T23:24:47Z better documentation commit edae547949571a80a9a1cedba88c55e8f123a97c Author: Reza Zadeh Date: 2014-09-26T23:29:28Z Better documentation commit 6140f3f5aa202f6635f4dc07da8c9f790382968e Author: Reza Zadeh Date: 2014-09-30T04:49:53Z Add funny loss commit c1f2216c326b49b82703e01a20be95e718601f56 Author: Reza Zadeh Date: 2014-10-01T20:25:02Z Funny Loss example commit 222e38dd40a12a3b6b9305609b8abd0ccdc61b8c Author: Reza Zadeh Date: 2014-10-04T19:14:17Z New interface commit 9be6c288795a5fe5e8a33afe8d1bb09174db9901 Author: Reza Zadeh Date: 2014-10-04T19:18:51Z Documentation commit 643fd50f27c430c62a982f1ba38a3e190d097232 Author: Reza Zadeh Date: 2014-10-04T19:19:38Z Move to new directory commit 20128d9e97e2ba8b19bfde3f57200d805f44a75e Author: Reza Zadeh Date: 2014-10-04T19:21:49Z Readme first version commit 51c4cb8e53a1549faed66f197a8821ca5618aa10 Author: Reza Zadeh Date: 2014-10-04T19:30:54Z Movement message commit 9f2469d5d8073b3036ff7f712ab2d256b1fc72b6 Author: Reza Zadeh Date: 2014-10-04T19:50:03Z Initial README commit 13693d09dd21c32c8c1a4047bc5021ed014db776 Author: Reza Zadeh Date: 2014-10-04T20:01:44Z Better readme --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional
[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9192#issuecomment-221169975 **[Test build #59183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59183/consoleFull)** for PR 9192 at commit [`3a2e0c7`](https://github.com/apache/spark/commit/3a2e0c7919b9fdbd5558cda474368c25208856b0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13247#issuecomment-221168962 Is the idea to have a list of configs that are marked as mutable and explicitly throw exceptions when users modify them? I still don't see how that relates to the changes here... Why not just have that list in SQLConf? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/13247#issuecomment-221168724 Previously, @yhuai mentions we need to issue exceptions when users change the static configuration at runtime. See https://github.com/apache/spark/pull/13128#issuecomment-220411852 When trying to do it in a cleaner way, I plan to add these logics into `RuntimeConfig`. However, I found `SQLContext` has two entrances for configuration, one is `conf` and another is `runtimeConf`. Thus, I think we have to remove the duplicate before working on it. Do you think my concern is OK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15492][ML][DOC]:Binarization scala exam...
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/13266#issuecomment-221168009 @jerryshao We have several similar bugs fixed. I am doing QA for ML 2.0 document now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12919#issuecomment-221167894 **[Test build #59180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59180/consoleFull)** for PR 12919 at commit [`6551fb4`](https://github.com/apache/spark/commit/6551fb420e003949fce421ce14111b40e7309631). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends RunnableCommand ` * `case class ListJarsCommand(jars: Seq[String] = Seq.empty[String]) extends RunnableCommand ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221167951 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12919#issuecomment-221167948 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221167954 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59181/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221167902 **[Test build #59181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59181/consoleFull)** for PR 12889 at commit [`6e35559`](https://github.com/apache/spark/commit/6e355593ca1a4a288f9c17cc15c2ff34c128846d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/12919#issuecomment-221167949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59180/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r64328639 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,154 @@ regression model and extracting model summary statistics. +## Generalized linear regression + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). +Spark's `GeneralizedLinearRegression` interface +allows for flexible specification of GLMs which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. +Currently in `spark.ml`, only a subset of the exponential family distributions are supported and they are listed +[below](#available-families). + +**NOTE**: Spark currently only supports up to 4096 features for GLM models, and will throw an exception if this +constraint is exceeded. See the [optimization section](#optimization) for more details. + +In a GLM the resonse variable $Y_i$ is assumed to be drawn from an exponential family distribution: + +$$ +Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right) --- End diff -- Same for any other notation --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/13139#issuecomment-221167413 @yanboliang @sethah Could you please reconcile this PR with [https://github.com/apache/spark/pull/13262]? Either option is OK with me. If I had to choose, I'd put the optimization stuff in ml-advanced since most users will not need to know it. @sethah Where are you drawing your notation from? If it's a source online, could you link to it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r64328611 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,154 @@ regression model and extracting model summary statistics. +## Generalized linear regression + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). +Spark's `GeneralizedLinearRegression` interface +allows for flexible specification of GLMs which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. +Currently in `spark.ml`, only a subset of the exponential family distributions are supported and they are listed +[below](#available-families). + +**NOTE**: Spark currently only supports up to 4096 features for GLM models, and will throw an exception if this +constraint is exceeded. See the [optimization section](#optimization) for more details. + +In a GLM the resonse variable $Y_i$ is assumed to be drawn from an exponential family distribution: + +$$ +Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right) --- End diff -- phi should be defined. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13186#issuecomment-221167163 **[Test build #59182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59182/consoleFull)** for PR 13186 at commit [`4a20bad`](https://github.com/apache/spark/commit/4a20badceafbd30790575bda9841959d6c7a0c2f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221167011 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221167012 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59179/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221167002 **[Test build #59179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59179/consoleFull)** for PR 13273 at commit [`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12919#issuecomment-221166703 **[Test build #59180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59180/consoleFull)** for PR 12919 at commit [`6551fb4`](https://github.com/apache/spark/commit/6551fb420e003949fce421ce14111b40e7309631). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/12889#issuecomment-221166701 **[Test build #59181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59181/consoleFull)** for PR 12889 at commit [`6e35559`](https://github.com/apache/spark/commit/6e355593ca1a4a288f9c17cc15c2ff34c128846d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15480][UI][Streaming]show missed InputI...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/13259#issuecomment-221166631 cc @zsxwing --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221166175 **[Test build #59179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59179/consoleFull)** for PR 13273 at commit [`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221166121 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13243#discussion_r64327921 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -58,4 +58,39 @@ class DataFrameComplexTypeSuite extends QueryTest with SharedSQLContext { val nullIntRow = df.selectExpr("i[1]").collect()(0) assert(nullIntRow == org.apache.spark.sql.Row(null)) } + + test("SPARK-15285 Generated SpecificSafeProjection.apply method grows beyond 64KB") { +val ds100_5 = Seq(S100_5()).toDS() +ds100_5.rdd.count + } } + +case class S100( --- End diff -- scala 2.10 doesn't support large case class. We can create a new test suite under `scala-2.11/src/test` and put this test there, so that we only run it under scala 2.10. `repl` module is a good example about it. cc @kiszk do you mind resend this PR with the fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/13270#discussion_r64327851 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -216,7 +216,25 @@ class SessionCatalog( val table = formatTableName(tableDefinition.identifier.table) val newTableDefinition = tableDefinition.copy(identifier = TableIdentifier(table, Some(db))) requireDbExists(db) -externalCatalog.createTable(db, newTableDefinition, ignoreIfExists) + +if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) { + // !! HACK ALERT !! + // + // See https://issues.apache.org/jira/browse/SPARK-15269 for more details about why we have to + // set `locationUri` and then remove the directory after creating the external table: + val tablePath = defaultTablePath(newTableDefinition.identifier) + try { +externalCatalog.createTable( + db, + newTableDefinition.withNewStorage(locationUri = Some(tablePath)), + ignoreIfExists) + } finally { +val path = new Path(tablePath) +FileSystem.get(path.toUri, hadoopConf).delete(path, true) + } +} else { + externalCatalog.createTable(db, newTableDefinition, ignoreIfExists) +} --- End diff -- Yeah, thanks! Will add a check for the first case. The second case should be the reason why Jenkins tests failed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13243#issuecomment-221165640 Sorry we have to revert it as it breaks scala-2.10 build --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221165459 **[Test build #59178 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59178/consoleFull)** for PR 13273 at commit [`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221165478 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221165480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59178/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14554][SQL] disable whole stage codegen...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/12322#discussion_r64327687 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -620,6 +620,12 @@ class DatasetSuite extends QueryTest with SharedSQLContext { val df = streaming.join(static, Seq("b")) assert(df.isStreaming, "streaming Dataset returned false for 'isStreaming'.") } + + test("SPARK-14554: Dataset.map may generate wrong java code for wide table") { +val wideDF = sqlContext.range(10).select(Seq.tabulate(1000) {i => ('id + i).as(s"c$i")} : _*) +// Make sure the generated code for this plan can compile and execute. +wideDF.map(_.getLong(0)).collect() --- End diff -- it's fixed in https://github.com/apache/spark/pull/13273 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221164516 **[Test build #59178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59178/consoleFull)** for PR 13273 at commit [`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/13273#discussion_r64327313 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -27,12 +25,12 @@ import org.apache.commons.lang3.StringUtils */ object CodeFormatter { def format(code: CodeAndComment): String = { -new CodeFormatter().addLines( - StringUtils.replaceEach( -code.body, -code.comment.keys.toArray, -code.comment.values.toArray) -).result +val formatter = new CodeFormatter +code.body.split("\n").foreach { line => + val trimmed = line.trim + formatter.addLine(code.comment.getOrElse(trimmed, trimmed)) +} +formatter.result() --- End diff -- cc @sarutak , here I assume the placeholder will always take a entire line, is it corrected? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/13273 [SPARK-15498][TESTS] fix slow tests ## What changes were proposed in this pull request? This PR fixes 3 slow tests: 1. `ParquetQuerySuite.read/write wide table`: This is not a good unit test as it runs more than 5 minutes. This PR removes it and add a new regression test in `CodeGenerationSuite`, which is more "unit". 2. `ParquetQuerySuite.returning batch for wide table`: reduce the threshold and use smaller data size. 3. `DatasetSuite.SPARK-14554: Dataset.map may generate wrong java code for wide table`: Improve `CodeFormatter.format`(introduced at https://github.com/apache/spark/pull/12979) can dramatically speed this it up. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13273.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13273 commit 216fc5c8affc13debe7107ce97067d6da317ce47 Author: Wenchen Fan Date: 2016-05-24T04:24:53Z fix slow tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221164134 cc @davies @andrewor14 @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r64327019 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,154 @@ regression model and extracting model summary statistics. +## Generalized linear regression + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). +Spark's `GeneralizedLinearRegression` interface +allows for flexible specification of GLMs which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. +Currently in `spark.ml`, only a subset of the exponential family distributions are supported and they are listed +[below](#available-families). + +**NOTE**: Spark currently only supports up to 4096 features for GLM models, and will throw an exception if this +constraint is exceeded. See the [optimization section](#optimization) for more details. --- End diff -- Note that, for certain models, you can call LinearRegression or LogisticRegression to use other solvers which support more features. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/13139#discussion_r64327021 --- Diff: docs/ml-classification-regression.md --- @@ -374,6 +374,154 @@ regression model and extracting model summary statistics. +## Generalized linear regression + +Contrasted with linear regression where the output is assumed to follow a Gaussian +distribution, [generalized linear models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are specifications of linear models where the response variable $Y_i$ may take on _any_ +distribution from the [exponential family of distributions](https://en.wikipedia.org/wiki/Exponential_family). +Spark's `GeneralizedLinearRegression` interface +allows for flexible specification of GLMs which can be used for various types of +prediction problems including linear regression, Poisson regression, logistic regression, and others. +Currently in `spark.ml`, only a subset of the exponential family distributions are supported and they are listed +[below](#available-families). + +**NOTE**: Spark currently only supports up to 4096 features for GLM models, and will throw an exception if this +constraint is exceeded. See the [optimization section](#optimization) for more details. + +In a GLM the resonse variable $Y_i$ is assumed to be drawn from an exponential family distribution: --- End diff -- typo: "response" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15451][build] Use jdk7's rt.jar when av...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13272#issuecomment-221162525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59177/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15451][build] Use jdk7's rt.jar when av...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13272#issuecomment-221162523 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15451][build] Use jdk7's rt.jar when av...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/13272#issuecomment-221162435 **[Test build #59177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59177/consoleFull)** for PR 13272 at commit [`50c5815`](https://github.com/apache/spark/commit/50c581561fbfff701babf29866c06aa4328c5ff6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/13243#issuecomment-221162353 thanks, merging to master and 2.0! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13243 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...
Github user xwu0226 commented on a diff in the pull request: https://github.com/apache/spark/pull/13212#discussion_r64325981 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for Spark-11624\";"))( "" -> "This is a test for Spark-11624") } + + test("list jars") { +val jarFile = Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar") +runCliWithin(2.minute)( + s"ADD JAR $jarFile" -> "", + s"LIST JARS" -> "TestUDTF.jar", + s"List JAR $jarFile" -> "TestUDTF.jar" +) + } + + test("list files") { +val dataFilePath = Thread.currentThread().getContextClassLoader + .getResource("data/files/small_kv.txt") +runCliWithin(2.minute)( + s"ADD FILE $dataFilePath" -> "", + s"LIST FILES" -> "small_kv.txt", + s"LIST FILE $dataFilePath" -> "small_kv.txt" +) + } --- End diff -- let me check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15485] [SQL] [DOCS] Spark SQL Configura...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13263 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15485] [SQL] [DOCS] Spark SQL Configura...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13263#issuecomment-221161361 Merging in master/2.0. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13247#issuecomment-221161280 This looks ok, but can you remind me again what this pull request is actually solving? I feel it's just changing code for the sake of changing code here ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15340][SQL]Limit the size of the map us...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13130#issuecomment-221161008 Does it work for you when you changed it to 1 rather than 1000? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15495][SQL][WIP] Improve the explain ou...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/13271#issuecomment-221160878 hm this seems pretty complicated. Can we just have different expressions that are for the verbose mode, and when verbose mode is on, before explain, we replace the normal expressions with the verbose expressions? It seems a lot easier to do that way. This is similar to PrettyAttribute idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/13212#discussion_r64325475 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for Spark-11624\";"))( "" -> "This is a test for Spark-11624") } + + test("list jars") { +val jarFile = Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar") +runCliWithin(2.minute)( + s"ADD JAR $jarFile" -> "", + s"LIST JARS" -> "TestUDTF.jar", + s"List JAR $jarFile" -> "TestUDTF.jar" +) + } + + test("list files") { +val dataFilePath = Thread.currentThread().getContextClassLoader + .getResource("data/files/small_kv.txt") +runCliWithin(2.minute)( + s"ADD FILE $dataFilePath" -> "", + s"LIST FILES" -> "small_kv.txt", + s"LIST FILE $dataFilePath" -> "small_kv.txt" +) + } --- End diff -- Seems it is failing? Can you take a look? https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/1102/testReport/junit/org.apache.spark.sql.hive.thriftserver/CliSuite/list_files/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/12836#issuecomment-221158943 It seems that the generic functions FUN for aggregates have some limitations too: https://stat.ethz.ch/pipermail/r-help/2015-March/426535.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13243#issuecomment-221158001 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15494][SQL] encoder code cleanup
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/13269#issuecomment-221158097 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org