[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221181813
  
**[Test build #59187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59187/consoleFull)**
 for PR 9192 at commit 
[`a9479dd`](https://github.com/apache/spark/commit/a9479dd3ea1f8db84ec7dd26989a0476a39419ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-23 Thread NarineK
Github user NarineK commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221181147
  
It seems that many ppl voted for point 2. So, I'll implement 2. 

2. function(key, df), where key is a list of grouping column values for 
this group, df is the data.frame of the group, containing the grouping columns. 
This is similar to the scala function signature for 
KeyValueGroupedData.flatMapGroups().


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-23 Thread NarineK
Github user NarineK commented on a diff in the pull request:

https://github.com/apache/spark/pull/12836#discussion_r64335943
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala ---
@@ -21,10 +21,12 @@ import scala.collection.JavaConverters._
 
 import org.apache.spark.annotation.Experimental
 import org.apache.spark.api.java.function._
+import org.apache.spark.sql.catalyst.analysis.UnresolvedDeserializer
--- End diff --

yes, will do that, thnx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64335944
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
   }
-  Utils.randomizeInPlace(samples, rand).take(num)
+  Utils.randomizeInPlace(samples.collect(), rand).take(num)
--- End diff --

Anyway, thank you for review, @andrewor14 . 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-23 Thread NarineK
Github user NarineK commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221180766
  
Ok, I ses, thnkx, @sun-rui 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64335756
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
   }
-  Utils.randomizeInPlace(samples, rand).take(num)
+  Utils.randomizeInPlace(samples.collect(), rand).take(num)
--- End diff --

If the situation happens, it will reduce more traffics, too.
But, the situation happens very rarely. Should I close this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-23 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221180426
  
We can also add an API later, supporting partial aggregation and final 
aggregation together, as we have done in RDD API. Refer to "aggregateRDD".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64335663
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
   }
-  Utils.randomizeInPlace(samples, rand).take(num)
+  Utils.randomizeInPlace(samples.collect(), rand).take(num)
--- End diff --

Ur, right. It'll take one more pass in all cases.
Hmm, maybe, that might be the main reason not to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-221180144
  
**[Test build #59186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59186/consoleFull)**
 for PR 13177 at commit 
[`156fea0`](https://github.com/apache/spark/commit/156fea0db2856c4eda3ff7496218e1c7d2082c4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r64335588
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case e: Throwable if isCausedBy(e, s"$name does not exist") =>
--- End diff --

@andrewor14 thanks. Changed to NonFatal.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64335520
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
--- End diff --

Yeah you are right about that. Let's not change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221179828
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59184/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221179827
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-23 Thread sun-rui
Github user sun-rui commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221179691
  
@thunterdb, @NarineK, definitely we can add API like aggregate() later 
based on the functionalities of two basic API. 
I can submit a JIRA issue for it later. We can allow passing a user defined 
function as FUN. We could support FUN as builit-in functions ('mean', 'sum', 
etc...) by internally created an R function wrapping it, but it seems not 
worth, As SparkDataFrame can provide such common aggregation functions, which 
run on JVM, having better performance than R worker. However, if any built-in 
functions in R has no parity in Spark Core, we can consider support it in 
SparkR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221179741
  
**[Test build #59184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59184/consoleFull)**
 for PR 12889 at commit 
[`020c096`](https://github.com/apache/spark/commit/020c0960ec9a379de4b7209151809f83fed1bf76).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RandomForestRegressionModel(TreeEnsembleModels, 
JavaPredictionModel, JavaMLWritable,`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64334880
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
   }
-  Utils.randomizeInPlace(samples, rand).take(num)
+  Utils.randomizeInPlace(samples.collect(), rand).take(num)
--- End diff --

won't this cause another job to be run? It's more memory efficient but it 
takes 1 more pass


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-221178646
  
**[Test build #59185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59185/consoleFull)**
 for PR 13177 at commit 
[`9f3b7db`](https://github.com/apache/spark/commit/9f3b7db2265ff2c89dc70feda8cd3e11f94f738e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/13177#issuecomment-221178563
  
@wangyang1992 thanks for the notebook. I am surprised it works actually! 
That said I do prefer your latest solution, which is more explicit and easier 
to understand. Once you address the last comment I'll go ahead and merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r64334580
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case e: Throwable if isCausedBy(e, s"$name does not exist") =>
--- End diff --

yes, but would you mind doing `case NonFatal(e) =>` here? It's generally a 
bad practice to catch throwables


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13186


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-23 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-221178042
  
Great, I'm merging this into master 2.0 thanks for fixing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-221177993
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59182/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64334393
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
--- End diff --

Yes. I agree with you on the pros and cons of the use of exception for this.
We can do that, but I hope to avoid that if possible since it'll change the 
public API signature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15388][SQL] Fix spark sql CREATE FUNCTI...

2016-05-23 Thread wangyang1992
Github user wangyang1992 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13177#discussion_r64334408
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala ---
@@ -480,11 +480,21 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
 try {
   Option(hive.getFunction(db, name)).map(fromHiveFunction)
 } catch {
-  case CausedBy(ex: NoSuchObjectException) if 
ex.getMessage.contains(name) =>
+  case e: Throwable if isCausedBy(e, s"$name does not exist") =>
--- End diff --

@andrewor14 will this work?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-221177992
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-221177827
  
**[Test build #59182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59182/consoleFull)**
 for PR 13186 at commit 
[`4a20bad`](https://github.com/apache/spark/commit/4a20badceafbd30790575bda9841959d6c7a0c2f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14557][SQL] Reading textfile (created t...

2016-05-23 Thread kasjain
Github user kasjain commented on the pull request:

https://github.com/apache/spark/pull/12356#issuecomment-221177377
  
Sure. Let me add the CTAS query in the test suite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13247#issuecomment-221177271
  
Can we just add a flag to SQLConf to indicate whether SparkSession has been 
properly initialized?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13192#discussion_r64333811
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -49,6 +49,24 @@ object CodeFormatter {
 }
 code.result()
   }
+
+  def stripOverlappingComments(codeAndComment: CodeAndComment): 
CodeAndComment = {
+val code = new StringBuilder
+val map = codeAndComment.comment
+var lastLine: String = "dummy"
+codeAndComment.body.split('\n').foreach { l =>
+  val line = l.trim()
+  val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") &&
+line.startsWith("/*") && line.endsWith("*/") &&
+map(lastLine).substring(3).contains(map(line).substring(3))
--- End diff --

I think it's okay for the performance.
- This function is used for at every `CodeAndComment` creation once.
- It scans `codeAndComment.body` once.
- Map lookup occurs on each line at most once. Also, it does not cost much 
in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221176753
  
@HyukjinKwon Addressed your comments and improved the documentation a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221176282
  
@HyukjinKwon If you don't supply those options they are set to the 
defaults. For the workings of the setQuoteEscapingEnabled see 
https://github.com/uniVocity/univocity-parsers/issues/38. In the test I 
supplied them to show a possible usecase (Redshift's CSV dialect). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-23 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r64333199
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for 
Spark-11624\";"))(
   "" -> "This is a test for Spark-11624")
   }
+
+  test("list jars") {
+val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
+runCliWithin(2.minute)(
+  s"ADD JAR $jarFile" -> "",
+  s"LIST JARS" -> "TestUDTF.jar",
+  s"List JAR $jarFile" -> "TestUDTF.jar"
+)
+  }
+
+  test("list files") {
+val dataFilePath = Thread.currentThread().getContextClassLoader
+  .getResource("data/files/small_kv.txt")
+runCliWithin(2.minute)(
+  s"ADD FILE $dataFilePath" -> "",
+  s"LIST FILES" -> "small_kv.txt",
+  s"LIST FILE $dataFilePath" -> "small_kv.txt"
+)
+  }
--- End diff --

@yhuai Should I remove the failing test case to allow the merge build test 
going while I continue investigate the root cause? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221174436
  
**[Test build #59184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59184/consoleFull)**
 for PR 12889 at commit 
[`020c096`](https://github.com/apache/spark/commit/020c0960ec9a379de4b7209151809f83fed1bf76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-23 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13192#discussion_r64332536
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -49,6 +49,24 @@ object CodeFormatter {
 }
 code.result()
   }
+
+  def stripOverlappingComments(codeAndComment: CodeAndComment): 
CodeAndComment = {
+val code = new StringBuilder
+val map = codeAndComment.comment
+var lastLine: String = "dummy"
+codeAndComment.body.split('\n').foreach { l =>
+  val line = l.trim()
+  val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") &&
+line.startsWith("/*") && line.endsWith("*/") &&
+map(lastLine).substring(3).contains(map(line).substring(3))
--- End diff --

Oh, it should work, I missed the `map`. Will it have performance issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-23 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13192#discussion_r64332407
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -49,6 +49,24 @@ object CodeFormatter {
 }
 code.result()
   }
+
+  def stripOverlappingComments(codeAndComment: CodeAndComment): 
CodeAndComment = {
+val code = new StringBuilder
+val map = codeAndComment.comment
+var lastLine: String = "dummy"
+codeAndComment.body.split('\n').foreach { l =>
+  val line = l.trim()
+  val skip = lastLine.startsWith("/*") && lastLine.endsWith("*/") &&
+line.startsWith("/*") && line.endsWith("*/") &&
+map(lastLine).substring(3).contains(map(line).substring(3))
--- End diff --

Have you check that this actually work? I think we have placeholders here 
so will not find any duplicated comments to skip.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...

2016-05-23 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13247#issuecomment-221173676
  
Initially, I did try to do it in that way. The questions bothering me are 
how to know which changes are made at runtime? which changes are from external 
users?

After reading the code base, based on my understanding, we will not 
externalize `SQLConf` after introducing `RumtimeConfig`. The `set` APIs of 
`SQLConf` will be for internal usage only? Thus, we can do whatever we want, if 
necessary. For example, to verify the internal behaviors, our test suites are 
still allowed to change the configuration at runtime. For example, in multiple 
test cases, like  
[ddlsuite](https://github.com/apache/spark/blob/5afd927a47aa7ede3039234f2f7262e2247aa2ae/sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala#L129),
 we change `spark.sql.warehouse.dir` at runtime. 

Then, I am wondering if we just need to block the changes on the static 
configuration properties through `RuntimeConfig`? That is the reason why this 
PR is to remove the `conf` from `SQLContext`.

Feel free to let me know which ways are preferred. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15481][CORE] Prevent `takeSample` from ...

2016-05-23 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/13260#discussion_r64331578
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -550,17 +550,19 @@ abstract class RDD[T: ClassTag](
 } else {
   val fraction = SamplingUtils.computeFractionForSampleSize(num, 
initialCount,
 withReplacement)
-  var samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+  var samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+  var count = samples.count()
 
   // If the first sample didn't turn out large enough, keep trying 
to take samples;
   // this shouldn't happen often because we use a big multiplier 
for the initial size
   var numIters = 0
-  while (samples.length < num) {
+  while (count < num) {
 logWarning(s"Needed to re-sample due to insufficient sample 
size. Repeat #$numIters")
-samples = this.sample(withReplacement, fraction, 
rand.nextInt()).collect()
+samples = this.sample(withReplacement, fraction, 
rand.nextInt())
+count = samples.count()
 numIters += 1
--- End diff --

We could throw an exception after x iterations. It will be a bit of a pain 
to test though. I don't feel strongly about this, but it seems like a potential 
source of problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13243#issuecomment-221172065
  
Can we just use DefinedByConstructorParams rather than using case classes?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/13273#discussion_r64331268
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -27,12 +25,12 @@ import org.apache.commons.lang3.StringUtils
  */
 object CodeFormatter {
   def format(code: CodeAndComment): String = {
-new CodeFormatter().addLines(
-  StringUtils.replaceEach(
-code.body,
-code.comment.keys.toArray,
-code.comment.values.toArray)
-).result
+val formatter = new CodeFormatter
+code.body.split("\n").foreach { line =>
+  val trimmed = line.trim
+  formatter.addLine(code.comment.getOrElse(trimmed, trimmed))
+}
+formatter.result()
--- End diff --

How slow is it if we use an regexp to match the placeholder here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15475][SQL] Add tests for writing and r...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13253#issuecomment-221171864
  
Did we ever end up fixing https://issues.apache.org/jira/browse/SPARK-10216 
after it was reverted?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15472][SQL][Streaming] Add partitioned ...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13258#issuecomment-221171715
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221171754
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221171737
  
**[Test build #59183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59183/consoleFull)**
 for PR 9192 at commit 
[`3a2e0c7`](https://github.com/apache/spark/commit/3a2e0c7919b9fdbd5558cda474368c25208856b0).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221171755
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59183/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15472][SQL][Streaming] Add partitioned ...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13258#issuecomment-221171716
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59176/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15472][SQL][Streaming] Add partitioned ...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13258#issuecomment-221171564
  
**[Test build #59176 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59176/consoleFull)**
 for PR 13258 at commit 
[`936bf26`](https://github.com/apache/spark/commit/936bf26415ae4f8875b091bc1587409620a14e0a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15475][SQL] Add tests for writing and r...

2016-05-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/13253#issuecomment-221171371
  
Hi @rxin and @marmbrus,
As you already know a "critical" issue was found here, 
[SPARK-15393](https://issues.apache.org/jira/browse/SPARK-15393). So, 
[SPARK-10216](https://issues.apache.org/jira/browse/SPARK-10216) was reverted. 
It seems writing and reading empty data back were not tested across data 
sources. This PR includes the test which resembles the one provided in the JIRA 
ticket. Could you please take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15493][SQL] Allow setting the quoteEsca...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13267#issuecomment-221171295
  
**[Test build #3011 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3011/consoleFull)**
 for PR 13267 at commit 
[`caf8808`](https://github.com/apache/spark/commit/caf8808c78cd3b6feedc34ebbf02a05a6d194034).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Glrm

2016-05-23 Thread sushmitkarar
Github user sushmitkarar closed the pull request at:

https://github.com/apache/spark/pull/13274


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: Glrm

2016-05-23 Thread sushmitkarar
GitHub user sushmitkarar opened a pull request:

https://github.com/apache/spark/pull/13274

Glrm

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)


## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)


(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rezazadeh/spark glrm

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13274


commit c7679f91bbf79bcefeb8c9f7ee968aac1f39b503
Author: Reza Zadeh 
Date:   2014-08-27T07:17:32Z

First version of SparkGLRM

commit 1347655961e047488bcb7ceb753c16bb1c2d7e4a
Author: Reza Zadeh 
Date:   2014-08-27T07:19:02Z

Documentation

commit 16ae855c6664c276a0b2ef5fbf3c625251c9a82c
Author: Reza Zadeh 
Date:   2014-09-07T01:20:54Z

index bounds

commit aa24830dc22a1e95af6fea0282d31255fd335036
Author: Reza Zadeh 
Date:   2014-09-07T01:30:39Z

More data

commit ee6cd5328458bd83d16f2f2e43a64fdac0b090f8
Author: Reza Zadeh 
Date:   2014-09-07T01:34:33Z

Bigger dataset

commit be9a51b1cc77a8a546b8150dcd498cfaecb5f703
Author: Reza Zadeh 
Date:   2014-09-07T18:20:27Z

Larger data

commit 99971db070d6923ca55148a1fcc9dc55ff068472
Author: Reza Zadeh 
Date:   2014-09-10T00:01:06Z

Better random entry generation

commit 576d9ae365589d7e67cb697e6e7edbf7c70f1f0c
Author: Reza Zadeh 
Date:   2014-09-10T00:01:27Z

Better parameters

commit 1e5afe8212257fa4d05cea06665979ff9b3a9cc7
Author: Reza Zadeh 
Date:   2014-09-10T00:02:35Z

Better parameters

commit 04f48097a19de2857f49f162013fc22e217ab4eb
Author: Reza Zadeh 
Date:   2014-09-10T18:36:11Z

Proper display of status

commit 7489302795e0787a70b885090603380d06d3f7a6
Author: Reza Zadeh 
Date:   2014-09-20T06:33:14Z

chunking

commit 136d0310e5b5d2cb3341ea847b0a8fb989c21f77
Author: Reza Zadeh 
Date:   2014-09-21T06:34:02Z

Pluggable proxes

commit 49c9ca72599a26d3ff91ce97739d9eec5bc24d8b
Author: Reza Zadeh 
Date:   2014-09-21T06:51:13Z

Documentation

commit d8f07b4c66dce1fa0c7c3be4bfb978d62f63702b
Author: Reza Zadeh 
Date:   2014-09-22T05:15:55Z

add documentation

commit 0e62894e10682e92c1d44375e3567697cf1c0056
Author: Reza Zadeh 
Date:   2014-09-22T05:18:40Z

better spacing

commit d70cfe659a95c792cb234df05ed24fdcddcf44ad
Author: Reza Zadeh 
Date:   2014-09-25T22:19:17Z

Better parameters

commit 2dae5b616604182b980978f5fb444d20f169b5eb
Author: Reza Zadeh 
Date:   2014-09-26T07:54:35Z

Better loss grads pluggability

commit 8c9e977bac6f66dec6c4f3b1e55065807e75eb1b
Author: Reza Zadeh 
Date:   2014-09-26T18:04:58Z

parmaeter changes

commit 5951d30c0aab9668be741d367ec7c0d57824a3d3
Author: Reza Zadeh 
Date:   2014-09-26T23:21:29Z

better stepsizes and library of proxes

commit 2c3f75b30b00a6d6363e08c584017564b8c33a51
Author: Reza Zadeh 
Date:   2014-09-26T23:24:47Z

better documentation

commit edae547949571a80a9a1cedba88c55e8f123a97c
Author: Reza Zadeh 
Date:   2014-09-26T23:29:28Z

Better documentation

commit 6140f3f5aa202f6635f4dc07da8c9f790382968e
Author: Reza Zadeh 
Date:   2014-09-30T04:49:53Z

Add funny loss

commit c1f2216c326b49b82703e01a20be95e718601f56
Author: Reza Zadeh 
Date:   2014-10-01T20:25:02Z

Funny Loss example

commit 222e38dd40a12a3b6b9305609b8abd0ccdc61b8c
Author: Reza Zadeh 
Date:   2014-10-04T19:14:17Z

New interface

commit 9be6c288795a5fe5e8a33afe8d1bb09174db9901
Author: Reza Zadeh 
Date:   2014-10-04T19:18:51Z

Documentation

commit 643fd50f27c430c62a982f1ba38a3e190d097232
Author: Reza Zadeh 
Date:   2014-10-04T19:19:38Z

Move to new directory

commit 20128d9e97e2ba8b19bfde3f57200d805f44a75e
Author: Reza Zadeh 
Date:   2014-10-04T19:21:49Z

Readme first version

commit 51c4cb8e53a1549faed66f197a8821ca5618aa10
Author: Reza Zadeh 
Date:   2014-10-04T19:30:54Z

Movement message

commit 9f2469d5d8073b3036ff7f712ab2d256b1fc72b6
Author: Reza Zadeh 
Date:   2014-10-04T19:50:03Z

Initial README

commit 13693d09dd21c32c8c1a4047bc5021ed014db776
Author: Reza Zadeh 
Date:   2014-10-04T20:01:44Z

Better readme




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional

[GitHub] spark pull request: [WIP] [SPARK-10903] [SPARKR] R - Simplify SQLC...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9192#issuecomment-221169975
  
**[Test build #59183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59183/consoleFull)**
 for PR 9192 at commit 
[`3a2e0c7`](https://github.com/apache/spark/commit/3a2e0c7919b9fdbd5558cda474368c25208856b0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13247#issuecomment-221168962
  
Is the idea to have a list of configs that are marked as mutable and 
explicitly throw exceptions when users modify them?

I still don't see how that relates to the changes here... Why not just have 
that list in SQLConf?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...

2016-05-23 Thread gatorsmile
Github user gatorsmile commented on the pull request:

https://github.com/apache/spark/pull/13247#issuecomment-221168724
  
Previously, @yhuai mentions we need to issue exceptions when users change 
the static configuration at runtime. See 
https://github.com/apache/spark/pull/13128#issuecomment-220411852 

When trying to do it in a cleaner way, I plan to add these logics into 
`RuntimeConfig`. However, I found `SQLContext` has two entrances for 
configuration, one is `conf` and another is `runtimeConf`. Thus, I think we 
have to remove the duplicate before working on it. 

Do you think my concern is OK?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15492][ML][DOC]:Binarization scala exam...

2016-05-23 Thread wangmiao1981
Github user wangmiao1981 commented on the pull request:

https://github.com/apache/spark/pull/13266#issuecomment-221168009
  
@jerryshao We have several similar bugs fixed. I am doing QA for ML 2.0 
document now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12919#issuecomment-221167894
  
**[Test build #59180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59180/consoleFull)**
 for PR 12919 at commit 
[`6551fb4`](https://github.com/apache/spark/commit/6551fb420e003949fce421ce14111b40e7309631).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) 
extends RunnableCommand `
  * `case class ListJarsCommand(jars: Seq[String] = Seq.empty[String]) 
extends RunnableCommand `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221167951
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12919#issuecomment-221167948
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221167954
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59181/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221167902
  
**[Test build #59181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59181/consoleFull)**
 for PR 12889 at commit 
[`6e35559`](https://github.com/apache/spark/commit/6e355593ca1a4a288f9c17cc15c2ff34c128846d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12919#issuecomment-221167949
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59180/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13139#discussion_r64328639
  
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,154 @@ regression model and extracting model summary 
statistics.
 
 
 
+## Generalized linear regression
+
+Contrasted with linear regression where the output is assumed to follow a 
Gaussian
+distribution, [generalized linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are 
specifications of linear models where the response variable $Y_i$ may take on 
_any_
+distribution from the [exponential family of 
distributions](https://en.wikipedia.org/wiki/Exponential_family).
+Spark's `GeneralizedLinearRegression` interface
+allows for flexible specification of GLMs which can be used for various 
types of
+prediction problems including linear regression, Poisson regression, 
logistic regression, and others.
+Currently in `spark.ml`, only a subset of the exponential family 
distributions are supported and they are listed
+[below](#available-families).
+
+**NOTE**: Spark currently only supports up to 4096 features for GLM 
models, and will throw an exception if this 
+constraint is exceeded. See the [optimization section](#optimization) for 
more details.
+
+In a GLM the resonse variable $Y_i$ is assumed to be drawn from an 
exponential family distribution:
+
+$$
+Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right)
--- End diff --

Same for any other notation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-23 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/13139#issuecomment-221167413
  
@yanboliang @sethah Could you please reconcile this PR with 
[https://github.com/apache/spark/pull/13262]?  Either option is OK with me.  If 
I had to choose, I'd put the optimization stuff in ml-advanced since most users 
will not need to know it.

@sethah Where are you drawing your notation from?  If it's a source online, 
could you link to it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13139#discussion_r64328611
  
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,154 @@ regression model and extracting model summary 
statistics.
 
 
 
+## Generalized linear regression
+
+Contrasted with linear regression where the output is assumed to follow a 
Gaussian
+distribution, [generalized linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are 
specifications of linear models where the response variable $Y_i$ may take on 
_any_
+distribution from the [exponential family of 
distributions](https://en.wikipedia.org/wiki/Exponential_family).
+Spark's `GeneralizedLinearRegression` interface
+allows for flexible specification of GLMs which can be used for various 
types of
+prediction problems including linear regression, Poisson regression, 
logistic regression, and others.
+Currently in `spark.ml`, only a subset of the exponential family 
distributions are supported and they are listed
+[below](#available-families).
+
+**NOTE**: Spark currently only supports up to 4096 features for GLM 
models, and will throw an exception if this 
+constraint is exceeded. See the [optimization section](#optimization) for 
more details.
+
+In a GLM the resonse variable $Y_i$ is assumed to be drawn from an 
exponential family distribution:
+
+$$
+Y_i \sim f\left(\cdot|\theta_i, \phi, w_i\right)
--- End diff --

phi should be defined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15397] [SQL] fix string udf locate as h...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13186#issuecomment-221167163
  
**[Test build #59182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59182/consoleFull)**
 for PR 13186 at commit 
[`4a20bad`](https://github.com/apache/spark/commit/4a20badceafbd30790575bda9841959d6c7a0c2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221167011
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221167012
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59179/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221167002
  
**[Test build #59179 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59179/consoleFull)**
 for PR 13273 at commit 
[`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15092][SPARK-15139][PYSPARK][ML] Pyspar...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12919#issuecomment-221166703
  
**[Test build #59180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59180/consoleFull)**
 for PR 12919 at commit 
[`6551fb4`](https://github.com/apache/spark/commit/6551fb420e003949fce421ce14111b40e7309631).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15113][PySpark][ML] Add missing num fea...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12889#issuecomment-221166701
  
**[Test build #59181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59181/consoleFull)**
 for PR 12889 at commit 
[`6e35559`](https://github.com/apache/spark/commit/6e355593ca1a4a288f9c17cc15c2ff34c128846d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15480][UI][Streaming]show missed InputI...

2016-05-23 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/13259#issuecomment-221166631
  
cc @zsxwing 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221166175
  
**[Test build #59179 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59179/consoleFull)**
 for PR 13273 at commit 
[`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221166121
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13243#discussion_r64327921
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -58,4 +58,39 @@ class DataFrameComplexTypeSuite extends QueryTest with 
SharedSQLContext {
 val nullIntRow = df.selectExpr("i[1]").collect()(0)
 assert(nullIntRow == org.apache.spark.sql.Row(null))
   }
+
+  test("SPARK-15285 Generated SpecificSafeProjection.apply method grows 
beyond 64KB") {
+val ds100_5 = Seq(S100_5()).toDS()
+ds100_5.rdd.count
+  }
 }
+
+case class S100(
--- End diff --

scala 2.10 doesn't support large case class. We can create a new test suite 
under `scala-2.11/src/test` and put this test there, so that we only run it 
under scala 2.10. `repl` module is a good example about it. cc @kiszk do you 
mind resend this PR with the fix?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15269][SQL] Removes unexpected empty ta...

2016-05-23 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/13270#discussion_r64327851
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala
 ---
@@ -216,7 +216,25 @@ class SessionCatalog(
 val table = formatTableName(tableDefinition.identifier.table)
 val newTableDefinition = tableDefinition.copy(identifier = 
TableIdentifier(table, Some(db)))
 requireDbExists(db)
-externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
+
+if (newTableDefinition.tableType == CatalogTableType.EXTERNAL) {
+  // !! HACK ALERT !!
+  //
+  // See https://issues.apache.org/jira/browse/SPARK-15269 for more 
details about why we have to
+  // set `locationUri` and then remove the directory after creating 
the external table:
+  val tablePath = defaultTablePath(newTableDefinition.identifier)
+  try {
+externalCatalog.createTable(
+  db,
+  newTableDefinition.withNewStorage(locationUri = Some(tablePath)),
+  ignoreIfExists)
+  } finally {
+val path = new Path(tablePath)
+FileSystem.get(path.toUri, hadoopConf).delete(path, true)
+  }
+} else {
+  externalCatalog.createTable(db, newTableDefinition, ignoreIfExists)
+}
--- End diff --

Yeah, thanks! Will add a check for the first case. The second case should 
be the reason why Jenkins tests failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13243#issuecomment-221165640
  
Sorry we have to revert it as it breaks scala-2.10 build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221165459
  
**[Test build #59178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59178/consoleFull)**
 for PR 13273 at commit 
[`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221165478
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221165480
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59178/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14554][SQL] disable whole stage codegen...

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/12322#discussion_r64327687
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala 
---
@@ -620,6 +620,12 @@ class DatasetSuite extends QueryTest with 
SharedSQLContext {
 val df = streaming.join(static, Seq("b"))
 assert(df.isStreaming, "streaming Dataset returned false for 
'isStreaming'.")
   }
+
+  test("SPARK-14554: Dataset.map may generate wrong java code for wide 
table") {
+val wideDF = sqlContext.range(10).select(Seq.tabulate(1000) {i => ('id 
+ i).as(s"c$i")} : _*)
+// Make sure the generated code for this plan can compile and execute.
+wideDF.map(_.getLong(0)).collect()
--- End diff --

it's fixed in https://github.com/apache/spark/pull/13273


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221164516
  
**[Test build #59178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59178/consoleFull)**
 for PR 13273 at commit 
[`216fc5c`](https://github.com/apache/spark/commit/216fc5c8affc13debe7107ce97067d6da317ce47).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/13273#discussion_r64327313
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala
 ---
@@ -27,12 +25,12 @@ import org.apache.commons.lang3.StringUtils
  */
 object CodeFormatter {
   def format(code: CodeAndComment): String = {
-new CodeFormatter().addLines(
-  StringUtils.replaceEach(
-code.body,
-code.comment.keys.toArray,
-code.comment.values.toArray)
-).result
+val formatter = new CodeFormatter
+code.body.split("\n").foreach { line =>
+  val trimmed = line.trim
+  formatter.addLine(code.comment.getOrElse(trimmed, trimmed))
+}
+formatter.result()
--- End diff --

cc @sarutak , here I assume the placeholder will always take a entire line, 
is it corrected?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/13273

[SPARK-15498][TESTS] fix slow tests

## What changes were proposed in this pull request?

This PR fixes 3 slow tests:

1. `ParquetQuerySuite.read/write wide table`: This is not a good unit test 
as it runs more than 5 minutes. This PR removes it and add a new regression 
test in `CodeGenerationSuite`, which is more "unit".
2. `ParquetQuerySuite.returning batch for wide table`: reduce the threshold 
and use smaller data size.
3. `DatasetSuite.SPARK-14554: Dataset.map may generate wrong java code for 
wide table`: Improve `CodeFormatter.format`(introduced at 
https://github.com/apache/spark/pull/12979) can dramatically speed this it up.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13273


commit 216fc5c8affc13debe7107ce97067d6da317ce47
Author: Wenchen Fan 
Date:   2016-05-24T04:24:53Z

fix slow tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13273#issuecomment-221164134
  
cc @davies @andrewor14 @yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13139#discussion_r64327019
  
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,154 @@ regression model and extracting model summary 
statistics.
 
 
 
+## Generalized linear regression
+
+Contrasted with linear regression where the output is assumed to follow a 
Gaussian
+distribution, [generalized linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are 
specifications of linear models where the response variable $Y_i$ may take on 
_any_
+distribution from the [exponential family of 
distributions](https://en.wikipedia.org/wiki/Exponential_family).
+Spark's `GeneralizedLinearRegression` interface
+allows for flexible specification of GLMs which can be used for various 
types of
+prediction problems including linear regression, Poisson regression, 
logistic regression, and others.
+Currently in `spark.ml`, only a subset of the exponential family 
distributions are supported and they are listed
+[below](#available-families).
+
+**NOTE**: Spark currently only supports up to 4096 features for GLM 
models, and will throw an exception if this 
+constraint is exceeded. See the [optimization section](#optimization) for 
more details.
--- End diff --

Note that, for certain models, you can call LinearRegression or 
LogisticRegression to use other solvers which support more features.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15186][ML][DOCS] Add user guide for gen...

2016-05-23 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/13139#discussion_r64327021
  
--- Diff: docs/ml-classification-regression.md ---
@@ -374,6 +374,154 @@ regression model and extracting model summary 
statistics.
 
 
 
+## Generalized linear regression
+
+Contrasted with linear regression where the output is assumed to follow a 
Gaussian
+distribution, [generalized linear 
models](https://en.wikipedia.org/wiki/Generalized_linear_model) (GLMs) are 
specifications of linear models where the response variable $Y_i$ may take on 
_any_
+distribution from the [exponential family of 
distributions](https://en.wikipedia.org/wiki/Exponential_family).
+Spark's `GeneralizedLinearRegression` interface
+allows for flexible specification of GLMs which can be used for various 
types of
+prediction problems including linear regression, Poisson regression, 
logistic regression, and others.
+Currently in `spark.ml`, only a subset of the exponential family 
distributions are supported and they are listed
+[below](#available-families).
+
+**NOTE**: Spark currently only supports up to 4096 features for GLM 
models, and will throw an exception if this 
+constraint is exceeded. See the [optimization section](#optimization) for 
more details.
+
+In a GLM the resonse variable $Y_i$ is assumed to be drawn from an 
exponential family distribution:
--- End diff --

typo: "response"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15451][build] Use jdk7's rt.jar when av...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13272#issuecomment-221162525
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59177/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15451][build] Use jdk7's rt.jar when av...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13272#issuecomment-221162523
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15451][build] Use jdk7's rt.jar when av...

2016-05-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/13272#issuecomment-221162435
  
**[Test build #59177 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59177/consoleFull)**
 for PR 13272 at commit 
[`50c5815`](https://github.com/apache/spark/commit/50c581561fbfff701babf29866c06aa4328c5ff6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...

2016-05-23 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/13243#issuecomment-221162353
  
thanks, merging to master and 2.0!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...

2016-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13243


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-23 Thread xwu0226
Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r64325981
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for 
Spark-11624\";"))(
   "" -> "This is a test for Spark-11624")
   }
+
+  test("list jars") {
+val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
+runCliWithin(2.minute)(
+  s"ADD JAR $jarFile" -> "",
+  s"LIST JARS" -> "TestUDTF.jar",
+  s"List JAR $jarFile" -> "TestUDTF.jar"
+)
+  }
+
+  test("list files") {
+val dataFilePath = Thread.currentThread().getContextClassLoader
+  .getResource("data/files/small_kv.txt")
+runCliWithin(2.minute)(
+  s"ADD FILE $dataFilePath" -> "",
+  s"LIST FILES" -> "small_kv.txt",
+  s"LIST FILE $dataFilePath" -> "small_kv.txt"
+)
+  }
--- End diff --

let me check


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15485] [SQL] [DOCS] Spark SQL Configura...

2016-05-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13263


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15485] [SQL] [DOCS] Spark SQL Configura...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13263#issuecomment-221161361
  
Merging in master/2.0. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15470] [SQL] Unify the Configuration In...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13247#issuecomment-221161280
  
This looks ok, but can you remind me again what this pull request is 
actually solving? I feel it's just changing code for the sake of changing code 
here ...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15340][SQL]Limit the size of the map us...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13130#issuecomment-221161008
  
Does it work for you when you changed it to 1 rather than 1000?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15495][SQL][WIP] Improve the explain ou...

2016-05-23 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/13271#issuecomment-221160878
  
hm this seems pretty complicated. Can we just have different expressions 
that are for the verbose mode, and when verbose mode is on, before explain, we 
replace the normal expressions with the verbose expressions? It seems a lot 
easier to do that way. This is similar to PrettyAttribute idea.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15431][SQL] Support LIST FILE(s)|JAR(s)...

2016-05-23 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/13212#discussion_r64325475
  
--- Diff: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala
 ---
@@ -238,4 +238,23 @@ class CliSuite extends SparkFunSuite with 
BeforeAndAfterAll with Logging {
 runCliWithin(2.minute, Seq("-e", "!echo \"This is a test for 
Spark-11624\";"))(
   "" -> "This is a test for Spark-11624")
   }
+
+  test("list jars") {
+val jarFile = 
Thread.currentThread().getContextClassLoader.getResource("TestUDTF.jar")
+runCliWithin(2.minute)(
+  s"ADD JAR $jarFile" -> "",
+  s"LIST JARS" -> "TestUDTF.jar",
+  s"List JAR $jarFile" -> "TestUDTF.jar"
+)
+  }
+
+  test("list files") {
+val dataFilePath = Thread.currentThread().getContextClassLoader
+  .getResource("data/files/small_kv.txt")
+runCliWithin(2.minute)(
+  s"ADD FILE $dataFilePath" -> "",
+  s"LIST FILES" -> "small_kv.txt",
+  s"LIST FILE $dataFilePath" -> "small_kv.txt"
+)
+  }
--- End diff --

Seems it is failing? Can you take a look? 
https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.2/1102/testReport/junit/org.apache.spark.sql.hive.thriftserver/CliSuite/list_files/
 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

2016-05-23 Thread NarineK
Github user NarineK commented on the pull request:

https://github.com/apache/spark/pull/12836#issuecomment-221158943
  
It seems that the generic functions FUN for aggregates have some 
limitations too:
https://stat.ethz.ch/pipermail/r-help/2015-March/426535.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15285][SQL] Generated SpecificSafeProje...

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13243#issuecomment-221158001
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15494][SQL] encoder code cleanup

2016-05-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/13269#issuecomment-221158097
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >