date:20141113

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63021620
  
@mengxr I got same result with you (using your test code), I will update 
the results in description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3262#issuecomment-63021308
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23354/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3262#issuecomment-63021306
  
  [Test build #23354 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23354/consoleFull)
 for   PR 3262 at commit 
[`1eda9e4`](https://github.com/apache/spark/commit/1eda9e4921617bc71acf2bb502cf3a22ee43c41f).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3239#issuecomment-63021309
  
  [Test build #23355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23355/consoleFull)
 for   PR 3239 at commit 
[`dfdb3d9`](https://github.com/apache/spark/commit/dfdb3d957a17e10911eb144ca992077db7837ec2).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3262#issuecomment-63021287
  
  [Test build #23354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23354/consoleFull)
 for   PR 3262 at commit 
[`1eda9e4`](https://github.com/apache/spark/commit/1eda9e4921617bc71acf2bb502cf3a22ee43c41f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...

2014-11-13 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3262#issuecomment-63021026
  
/cc @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/3197#issuecomment-63021064
  
+1 on statusTracker


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63020961
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23351/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...

2014-11-13 Thread zsxwing

GitHub user zsxwing opened a pull request:

https://github.com/apache/spark/pull/3262

[SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience

This PR moved `implicit`s to `package object` and `companion object` to 
enable the Scala compiler search them automatically without explicit importing.

It should not break any API. I compiled the following codes with Spark 
1.1.0:
```scala
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.spark.SparkContext._

object ImplicitBackforwardCompatibilityApp {
  def main(args: Array[String]): Unit = {
val conf = new 
SparkConf().setAppName("ImplicitBackforwardCompatibilityApp")
val sc = new SparkContext(conf)

val rdd = sc.parallelize(1 to 100).map(i => (i, i))
val rdd2 = rdd.groupByKey() // rddToPairRDDFunctions
val rdd3 = rdd2.sortByKey() // rddToOrderedRDDFunctions
val s1 = rdd3.map(_._1).stats() // numericRDDToDoubleRDDFunctions
println(s1)
val s2 = rdd3.map(_._1.toDouble).stats() // 
doubleRDDToDoubleRDDFunctions
println(s2)
val f = rdd2.countAsync() // rddToAsyncRDDActions
println(f.get())
rdd2.map { case (k, v) => (k, v.size) } 
saveAsSequenceFile("/tmp/test_path") // rddToSequenceFileRDDFunctions

val a1 = sc.accumulator(123.4) // DoubleAccumulatorParam
a1.add(1.0)
println(a1.value)
val a2 = sc.accumulator(123) // IntAccumulatorParam
a2.add(3)
println(a2.value)
val a3 = sc.accumulator(123L) // LongAccumulatorParam
a3.add(11L)
println(a3.value)
val a4 = sc.accumulator(123F) // FloatAccumulatorParam
a4.add(1.1F)
println(a4.value)

sc.stop()
  }
}
```
And run it with this PR. It ran correctly.

However, for `WritableConverter`, I cannot make it work without `import`. 
Thoughts?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zsxwing/spark SPARK-4397

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3262


commit 1eda9e4921617bc71acf2bb502cf3a22ee43c41f
Author: zsxwing 
Date:   2014-11-14T07:35:02Z

Reorganize 'implicit's to improve the API convenience




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3197#discussion_r20346415
  
--- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala ---
@@ -140,3 +103,10 @@ private[spark] trait SparkStatusAPI { this: 
SparkContext =>
 }
   }
 }
+
+private[spark] object SparkStatusAPI {
--- End diff --

Can't we just make the constructor package private? It is really awkward to 
me that you have to create a factory for this. If you really want a factory, 
I'd use something other than apply.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63020954
  
  [Test build #23351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23351/consoleFull)
 for   PR 3193 at commit 
[`51649f5`](https://github.com/apache/spark/commit/51649f5e5b29ab8db1c6c3fd91c6f625124ab327).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RDDRangeSampler(RDDSamplerBase):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10

2014-11-13 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3239#issuecomment-63020796
  
Updated the doc - it seems like there's actually not a ton more to say, but 
let me know if I missed anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63019228
  
@davies Did you only measure the `rdd.sample(...).count()`? Sampling 1 
million took about 0.6s without replacement and 2.5s with replacement on my 
computer. I think we use the same macbook model or yours is better:)

Maybe part of the time in your case was spent on broadcasting the rdd. 
Could you try the following:

~~~
from pyspark.mllib.random import RandomRDDs
rdd = RandomRDDs.uniformRDD(sc, 1 << 20, 1).cache()
rdd.count()
rdd.sample(True, 0.9).count()
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2014-11-13 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3255#discussion_r20345609
  
--- Diff: python/pyspark/context.py ---
@@ -191,7 +192,13 @@ def _do_init(self, master, appName, sparkHome, 
pyFiles, environment, batchSize,
 self._temp_dir = \
 
self._jvm.org.apache.spark.util.Utils.createTempDir(local_dir).getAbsolutePath()
 
+
 # profiling stats collected for each PythonRDD
+if self._conf.get("spark.python.profile", "false") == "true":
+self.profiler = profiler if profiler else BasicProfiler
+else:
+self.profiler = None
+
 self._profile_stats = []
--- End diff --

Maybe we could also move `_profile_stats` into Profiler, then the interface 
of Profiler will simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2014-11-13 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3255#discussion_r20345514
  
--- Diff: python/pyspark/profiler.py ---
@@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+>>> from pyspark.context import SparkContext
+>>> from pyspark.conf import SparkConf
+>>> from pyspark.profiler import BasicProfiler
+>>> class MyCustomProfiler(BasicProfiler):
--- End diff --

In order to have this as an example in API docs, it need to be moved into 
BasicProfiler

Also, import BasicProfiler into pyspark/__init__.py

you can build the API docs by 
```
$ cd python/docs/
$ make html
$ open _build/html/index.html
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor cleanup of comments, errors and ov...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3257#issuecomment-63017979
  
  [Test build #23353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23353/consoleFull)
 for   PR 3257 at commit 
[`d8b5abc`](https://github.com/apache/spark/commit/d8b5abcd61b6f96be23f21c89baaf926fb0cf185).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-63017550
  
  [Test build #23352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23352/consoleFull)
 for   PR 2991 at commit 
[`5461f1c`](https://github.com/apache/spark/commit/5461f1c43b0e98aa7b583f14569eefd833b19df0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-63017519
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2014-11-13 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3255#discussion_r20345324
  
--- Diff: python/pyspark/profiler.py ---
@@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+>>> from pyspark.context import SparkContext
+>>> from pyspark.conf import SparkConf
+>>> from pyspark.profiler import BasicProfiler
+>>> class MyCustomProfiler(BasicProfiler):
+... @staticmethod
+... def show_profiles(profilers):
+... print "My custom profiles"
+...
+>>> conf = SparkConf().set("spark.python.profile", "true")
+>>> sc = SparkContext('local', 'test', conf=conf, 
profiler=MyCustomProfiler)
+>>> sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10)
+[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
+>>> sc.show_profiles()
+My custom profiles
+>>> sc.stop()
+"""
+
+
+import cProfile
+import pstats
+import os
+from pyspark.accumulators import PStatsParam
+
+
+class BasicProfiler(object):
+"""
+
+:: DeveloperApi ::
--- End diff --

Yes :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2014-11-13 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3255#discussion_r20345305
  
--- Diff: python/pyspark/profiler.py ---
@@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+>>> from pyspark.context import SparkContext
+>>> from pyspark.conf import SparkConf
+>>> from pyspark.profiler import BasicProfiler
+>>> class MyCustomProfiler(BasicProfiler):
+... @staticmethod
+... def show_profiles(profilers):
+... print "My custom profiles"
--- End diff --

indent with 4 spaces here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread davies

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63017353
  
@mengxr I had simplified RDDSample by removing numpy, the reason has been 
updated in the description of this PR, please re-review it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10 and...

2014-11-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3239#issuecomment-63017250
  
Hey Sandy - I think this looks good. I wasn't able to get it to succeed 
locally. but I think something is messed up with my local environment since 
even the master build isn't working. Could you add the relevant documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63016854
  
  [Test build #23351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23351/consoleFull)
 for   PR 3193 at commit 
[`51649f5`](https://github.com/apache/spark/commit/51649f5e5b29ab8db1c6c3fd91c6f625124ab327).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1977][MLLIB] use immutable BitSet in AL...

2014-11-13 Thread aaronlin

Github user aaronlin commented on the pull request:

https://github.com/apache/spark/pull/925#issuecomment-63015981
  
twitter/chill#185 fixed in chill v0.4.0, but spark still depends on chill 
v0.3.6 in maven. 
http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.1.0
Can anyone help to fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3261#discussion_r20344792
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -23,6 +23,16 @@
 
 
 class Rating(object):
--- End diff --

Also saw a result on performance: 
http://stackoverflow.com/questions/2646157/what-is-the-fastest-to-access-struct-like-object-in-python

~~~
namedtuple.a  :  0.473686933517 
namedtuple[0] :  0.180409193039
struct.a  :  0.180846214294
struct[0] :  1.32191514969
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/3261#discussion_r20344712
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -23,6 +23,16 @@
 
 
 class Rating(object):
--- End diff --

Where to put `int(user)`, `int(product)`, and `float(rating)` then?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-63015296
  
The implementation looks good to me. @JoshRosen Do you want to take another 
pass?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3197#discussion_r20344507
  
--- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala ---
@@ -140,3 +103,10 @@ private[spark] trait SparkStatusAPI { this: 
SparkContext =>
 }
   }
 }
+
+private[spark] object SparkStatusAPI {
--- End diff --

The goal here was to hide this class's constructor from users so that we're 
free to change it later.  I think that making constructors part of public APIs 
is a bad idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3197#discussion_r20344484
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/java/JavaSparkStatusAPI.scala ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.java
+
+import org.apache.spark.{SparkStageInfo, SparkJobInfo, SparkContext}
+
+/**
+ * Low-level status reporting APIs for monitoring job and stage progress.
+ *
+ * These APIs intentionally provide very weak consistency semantics; 
consumers of these APIs should
+ * be prepared to handle empty / missing information.  For example, a 
job's stage ids may be known
+ * but the status API may not have any information about the details of 
those stages, so
+ * `getStageInfo` could potentially return `null` for a valid stage id.
+ *
+ * To limit memory usage, these APIs only provide information on recent 
jobs / stages.  These APIs
+ * will provide information for the last `spark.ui.retainedStages` stages 
and
+ * `spark.ui.retainedJobs` jobs.
+ */
+class JavaSparkStatusAPI private (sc: SparkContext) {
--- End diff --

There's one subtle difference: some of the Java methods return nullable 
values instead of Options.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3197#discussion_r20344165
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -228,6 +229,8 @@ class SparkContext(config: SparkConf) extends 
SparkStatusAPI with Logging {
   private[spark] val jobProgressListener = new JobProgressListener(conf)
   listenerBus.addListener(jobProgressListener)
 
+  val statusAPI = SparkStatusAPI(this)
--- End diff --

+1 just status would be better i think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3197#discussion_r20344140
  
--- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala ---
@@ -140,3 +103,10 @@ private[spark] trait SparkStatusAPI { this: 
SparkContext =>
 }
   }
 }
+
+private[spark] object SparkStatusAPI {
--- End diff --

why bother having this? we can just do new 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3197#discussion_r20344154
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/java/JavaSparkStatusAPI.scala ---
@@ -0,0 +1,77 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.api.java
+
+import org.apache.spark.{SparkStageInfo, SparkJobInfo, SparkContext}
+
+/**
+ * Low-level status reporting APIs for monitoring job and stage progress.
+ *
+ * These APIs intentionally provide very weak consistency semantics; 
consumers of these APIs should
+ * be prepared to handle empty / missing information.  For example, a 
job's stage ids may be known
+ * but the status API may not have any information about the details of 
those stages, so
+ * `getStageInfo` could potentially return `null` for a valid stage id.
+ *
+ * To limit memory usage, these APIs only provide information on recent 
jobs / stages.  These APIs
+ * will provide information for the last `spark.ui.retainedStages` stages 
and
+ * `spark.ui.retainedJobs` jobs.
+ */
+class JavaSparkStatusAPI private (sc: SparkContext) {
--- End diff --

can we conslidate the java and the scala class? it seems to me you are only 
using arrays, so it should be fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Several progress API improvements / refactorin...

2014-11-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3197#issuecomment-63013790
  
Okay I talked offline with @kayousterhout and the best name we could come 
up with was the following:

```
class SparkStatusTracker
...
val statusTracker = new SparkStatusTracker(this)
```

IMO this is nicer than the current name since `API` is sort of implicit in 
the fact that this is an exposed class (i.e. in some sense everything is an 
API). The name "Tracker" implies that this is an object that actively is 
tracking changes. So this is my favorite option. I also thing `SparkStatus` and 
`val status` is alright. Both of these I prefer to the current naming.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Scala 2.11

2014-11-13 Thread ScrapCodes

Github user ScrapCodes closed the pull request at:

https://github.com/apache/spark/pull/3181


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...

2014-11-13 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3238#discussion_r20343870
  
--- Diff: 
yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala ---
@@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging {
 }
   }
 }
+
+/**
+ * Do the same for datanucleus jars, if they exist in spark home. Find 
all datanucleus-* jars,
+ * copy them to the remote fs, and add them to the class path.
+ */
+val sparkHomeOpt = 
sparkConf.getOption("spark.home").orElse(sys.env.get("SPARK_HOME"))
+for (sparkHome <- sparkHomeOpt) {
+  val libs = sparkHome + Path.SEPARATOR + "lib"
+  val jars = new File(libs).listFiles(new FilenameFilter() {
+override def accept(dir: File, name: String) = 
name.startsWith("datanucleus-")
+  })
+  // copy to remote and add to classpath
+  jars.foreach { jar =>
--- End diff --

Isn't it because of licensing? Datanucleus is  
LGPL or similar?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3204#issuecomment-63010646
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23350/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3204#issuecomment-63010643
  
  [Test build #23350 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23350/consoleFull)
 for   PR 3204 at commit 
[`c4ed549`](https://github.com/apache/spark/commit/c4ed549f8ef6cc22dce50be2ad418ee9a9211b19).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/3261#discussion_r20342470
  
--- Diff: python/pyspark/mllib/recommendation.py ---
@@ -23,6 +23,16 @@
 
 
 class Rating(object):
--- End diff --

I think this can be simplified as 
```
class Rating(namedtuple('user', 'product', 'rating')):
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3261#issuecomment-63007855
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23349/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3261#issuecomment-63007848
  
  [Test build #23349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23349/consoleFull)
 for   PR 3261 at commit 
[`d3bd7d4`](https://github.com/apache/spark/commit/d3bd7d41fa1623e5eb368bc6af3711769d1a27e7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-63007410
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23348/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-63007406
  
  [Test build #23348 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23348/consoleFull)
 for   PR 1567 at commit 
[`89e37d8`](https://github.com/apache/spark/commit/89e37d82d72ac614af0efcd810ac4b9b034d4253).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class GroupExpression(children: Seq[Expression]) extends 
Expression `
  * `case class Explosive(`
  * `trait GroupingSets extends UnaryNode `
  * `case class GroupingSet(`
  * `case class Cube(`
  * `case class Rollup(`
  * `case class Explosive(`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3204#issuecomment-63005533
  
  [Test build #23350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23350/consoleFull)
 for   PR 3204 at commit 
[`c4ed549`](https://github.com/apache/spark/commit/c4ed549f8ef6cc22dce50be2ad418ee9a9211b19).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2014-11-13 Thread udnay

Github user udnay commented on the pull request:

https://github.com/apache/spark/pull/3255#issuecomment-63005535
  
I believe I took care of your comments/concerns. Could you have another 
look when you get a chance?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...

2014-11-13 Thread udnay

Github user udnay commented on a diff in the pull request:

https://github.com/apache/spark/pull/3255#discussion_r20341026
  
--- Diff: python/pyspark/profiler.py ---
@@ -0,0 +1,108 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+"""
+>>> from pyspark.context import SparkContext
+>>> from pyspark.conf import SparkConf
+>>> from pyspark.profiler import BasicProfiler
+>>> class MyCustomProfiler(BasicProfiler):
+... @staticmethod
+... def show_profiles(profilers):
+... print "My custom profiles"
+...
+>>> conf = SparkConf().set("spark.python.profile", "true")
+>>> sc = SparkContext('local', 'test', conf=conf, 
profiler=MyCustomProfiler)
+>>> sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10)
+[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
+>>> sc.show_profiles()
+My custom profiles
+>>> sc.stop()
+"""
+
+
+import cProfile
+import pstats
+import os
+from pyspark.accumulators import PStatsParam
+
+
+class BasicProfiler(object):
+"""
+
+:: DeveloperApi ::
--- End diff --

@davies Is this how to mark it as a developer API?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...

2014-11-13 Thread sryza

Github user sryza commented on the pull request:

https://github.com/apache/spark/pull/3204#issuecomment-63005126
  
Updated patch addresses review comments


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3112#issuecomment-63004489
  
  [Test build #23347 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23347/consoleFull)
 for   PR 3112 at commit 
[`ed1a25c`](https://github.com/apache/spark/commit/ed1a25c85b2c80802f29700f363b9ef05721b395).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3112#issuecomment-63004493
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23347/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3261#issuecomment-63004032
  
  [Test build #23349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23349/consoleFull)
 for   PR 3261 at commit 
[`d3bd7d4`](https://github.com/apache/spark/commit/d3bd7d41fa1623e5eb368bc6af3711769d1a27e7).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3243#issuecomment-63003925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23346/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3243#issuecomment-63003920
  
  [Test build #23346 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23346/consoleFull)
 for   PR 3243 at commit 
[`4653378`](https://github.com/apache/spark/commit/4653378fb6addfdf4fb21e4e75570c163d601bfb).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...

2014-11-13 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/3261

[SPARK-4396] allow lookup by index in Python's Rating

In PySpark, ALS can take an RDD of (user, product, rating) tuples as input. 
However, model.predict outputs an RDD of Rating. So on the input side, users 
can use r[0], r[1], r[2], while on the output side, users have to use r.user, 
r.product, r.rating. We should allow lookup by index in Rating.

@davies

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark SPARK-4396

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3261.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3261


commit d3bd7d41fa1623e5eb368bc6af3711769d1a27e7
Author: Xiangrui Meng 
Date:   2014-11-14T02:51:20Z

allow lookup by index in Python's Rating




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-63002982
  
  [Test build #23345 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23345/consoleFull)
 for   PR 2991 at commit 
[`5461f1c`](https://github.com/apache/spark/commit/5461f1c43b0e98aa7b583f14569eefd833b19df0).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-63002988
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23345/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1567#issuecomment-63002602
  
  [Test build #23348 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23348/consoleFull)
 for   PR 1567 at commit 
[`89e37d8`](https://github.com/apache/spark/commit/89e37d82d72ac614af0efcd810ac4b9b034d4253).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4313][WebUI][Yarn] Fix link issue of th...

2014-11-13 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3183#issuecomment-63001688
  
@JoshRosen not only executor id, but also any string will appear in the 
URL, should pay attention to `%`.  However, I don't know a proper place to add 
such general docs. Other suggestion to this PR, or it's fine to merge? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3259#issuecomment-63001443
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23342/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3259#issuecomment-63001434
  
  [Test build #23342 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23342/consoleFull)
 for   PR 3259 at commit 
[`3200c33`](https://github.com/apache/spark/commit/3200c33363d8daed187ecd10f7b5fc370d44f349).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3260#issuecomment-63000880
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23343/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3260#issuecomment-63000877
  
  [Test build #23343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23343/consoleFull)
 for   PR 3260 at commit 
[`9a5e171`](https://github.com/apache/spark/commit/9a5e17166c5f8c75f067846ec5f515db0857f1ea).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class InSet(value: Expression, hset: Set[Any])`
  * `case class In(attribute: String, values: Array[Any]) extends Filter`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4379][Core] Change Exception to SparkEx...

2014-11-13 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3241#issuecomment-63000329
  
I'm sorry that I should have been clearer when I said the breaking change. 
I'm worried about the following   case:
```
  try {
rdd.checkpoint()
  } catch {
case e: SparkException => // do work A
case e: Exception => do work B
  }
```
It breaks such case. However, I think few people will write such code.

Therefore, does Spark view such change as a breaking change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4379][Core] Change Exception to SparkEx...

2014-11-13 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3241#issuecomment-6373
  
Exception won't be in the method signature.

```scala
scala> import java.io.IOException
import java.io.IOException

scala> class A {
 |   @throws[IOException]
 |   def foo() {
 | throw new IOException("error!")
 |   }
 | 
 |   def bar(): Unit = {
 | foo()
 |   }
 | }
defined class A

scala> :javap -private -c A
Compiled from ""
public class A extends java.lang.Object{
public void foo()   throws java.io.IOException;
  Code:
   0:   new #9; //class java/io/IOException
   3:   dup
   4:   ldc #11; //String error!
   6:   invokespecial   #15; //Method 
java/io/IOException."":(Ljava/lang/String;)V
   9:   athrow

public void bar();
  Code:
   0:   aload_0
   1:   invokevirtual   #20; //Method foo:()V
   4:   return

public A();
  Code:
   0:   aload_0
   1:   invokespecial   #22; //Method java/lang/Object."":()V
   4:   return

}
```

In the `bar()`, the instruction is `1:  invokevirtual   #20; //Method 
foo:()V`. The method is still `foo:()V`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10 and...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3239#issuecomment-62999337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23339/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10 and...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3239#issuecomment-62999331
  
  [Test build #23339 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23339/consoleFull)
 for   PR 3239 at commit 
[`587f671`](https://github.com/apache/spark/commit/587f6713d0a4a0e6727807ff432e334bf08eeb4a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...

2014-11-13 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/3112#issuecomment-62998569
  
@JoshRosen  I have reverted the dot which I think is produced in modify 
comments. And the blank between `!` and `args.isEmpty` in 
`ApplicationMasterArguments` is unnecessary so I keep the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-62998448
  
Hi TD, this test is so flaky, it fails several times in my local test:

```
- block addition, block to batch allocation and cleanup with write ahead 
log *** FAILED *** (21 milliseconds)
[info]   java.io.FileNotFoundException: File 
/tmp/1415929501402-0/receivedBlockMetadata/log-0-1000 does not exist.
[info]   at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
[info]   at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:324)
[info]   at 
org.apache.spark.streaming.util.WriteAheadLogSuite$.getLogFilesInDirectory(WriteAheadLogSuite.scala:344)
[info]   at 
org.apache.spark.streaming.ReceivedBlockTrackerSuite.getWriteAheadLogFiles(ReceivedBlockTrackerSuite.scala:226)
[info]   at 
org.apache.spark.streaming.ReceivedBlockTrackerSuite$$anonfun$4.apply$mcV$sp(ReceivedBlockTrackerSuite.scala:171)
[info]   at 
org.apache.spark.streaming.ReceivedBlockTrackerSuite$$anonfun$4.apply(ReceivedBlockTrackerSuite.scala:96)
[info]   at 
org.apache.spark.streaming.ReceivedBlockTrackerSuite$$anonfun$4.apply(ReceivedBlockTrackerSuite.scala:96)
[info]   at 

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3112#issuecomment-62998382
  
  [Test build #23347 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23347/consoleFull)
 for   PR 3112 at commit 
[`ed1a25c`](https://github.com/apache/spark/commit/ed1a25c85b2c80802f29700f363b9ef05721b395).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1812] Scala 2.11 support.

2014-11-13 Thread ScrapCodes

Github user ScrapCodes closed the pull request at:

https://github.com/apache/spark/pull/3111


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3258#issuecomment-62997903
  
  [Test build #23340 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23340/consoleFull)
 for   PR 3258 at commit 
[`15e9a98`](https://github.com/apache/spark/commit/15e9a98e75928915b9a2c2c1c02d88bba3756485).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3258#issuecomment-62997907
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23340/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4390][SQL] Handle NaN cast to decimal c...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3256#issuecomment-62997844
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23341/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4390][SQL] Handle NaN cast to decimal c...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3256#issuecomment-62997840
  
  [Test build #23341 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23341/consoleFull)
 for   PR 3256 at commit 
[`4c3ba46`](https://github.com/apache/spark/commit/4c3ba4617716bbc3bee95ff258ace2661b60f136).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3243#issuecomment-62997771
  
  [Test build #23346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23346/consoleFull)
 for   PR 3243 at commit 
[`4653378`](https://github.com/apache/spark/commit/4653378fb6addfdf4fb21e4e75570c163d601bfb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...

2014-11-13 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the pull request:

https://github.com/apache/spark/pull/2579#issuecomment-62997806
  
@tgravescs Is it ok to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-62997454
  
Lets see if this passes jenkins, I hadnt tried that yet


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-62996965
  
  [Test build #23345 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23345/consoleFull)
 for   PR 2991 at commit 
[`5461f1c`](https://github.com/apache/spark/commit/5461f1c43b0e98aa7b583f14569eefd833b19df0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3243#issuecomment-62996707
  
  [Test build #23344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23344/consoleFull)
 for   PR 3243 at commit 
[`e9145e8`](https://github.com/apache/spark/commit/e9145e8ac6798bb9e2587e2eb67da6209456840f).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NettyBlockTransferService(conf: SparkConf, securityManager: 
SecurityManager, numCores: Int)`
  * `public class JavaSimpleTextClassificationPipeline `
  * `case class LabeledDocument(id: Long, text: String, label: Double)`
  * `case class Document(id: Long, text: String)`
  * `abstract class EdgeRDD[ED, VD](`
  * `abstract class VertexRDD[VD](`
  * `abstract class Estimator[M <: Model[M]] extends PipelineStage with 
Params `
  * `abstract class Evaluator extends Identifiable `
  * `abstract class Model[M <: Model[M]] extends Transformer `
  * `abstract class PipelineStage extends Serializable with Logging `
  * `class Pipeline extends Estimator[PipelineModel] `
  * `abstract class Transformer extends PipelineStage with Params `
  * `class LogisticRegression extends Estimator[LogisticRegressionModel] 
with LogisticRegressionParams `
  * `class HashingTF extends UnaryTransformer[Iterable[_], Vector, 
HashingTF] `
  * `class StandardScaler extends Estimator[StandardScalerModel] with 
StandardScalerParams `
  * `class Tokenizer extends UnaryTransformer[String, Seq[String], 
Tokenizer] `
  * `class Param[T] (`
  * `class DoubleParam(parent: Params, name: String, doc: String, 
defaultValue: Option[Double] = None)`
  * `class IntParam(parent: Params, name: String, doc: String, 
defaultValue: Option[Int] = None)`
  * `class FloatParam(parent: Params, name: String, doc: String, 
defaultValue: Option[Float] = None)`
  * `class LongParam(parent: Params, name: String, doc: String, 
defaultValue: Option[Long] = None)`
  * `class BooleanParam(parent: Params, name: String, doc: String, 
defaultValue: Option[Boolean] = None)`
  * `case class ParamPair[T](param: Param[T], value: T)`
  * `trait Params extends Identifiable with Serializable `
  * `class CrossValidator extends Estimator[CrossValidatorModel] with 
CrossValidatorParams with Logging `
  * `class ParamGridBuilder `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3243#issuecomment-62996710
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23344/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4092] [CORE] Fix InputMetrics for coale...

2014-11-13 Thread ash211

Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/3120#issuecomment-62996573
  
@ksakellis it looks like this has a merge conflict now -- would you mind 
updating this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3243#issuecomment-62996516
  
  [Test build #23344 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23344/consoleFull)
 for   PR 3243 at commit 
[`e9145e8`](https://github.com/apache/spark/commit/e9145e8ac6798bb9e2587e2eb67da6209456840f).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...

2014-11-13 Thread shenh062326

Github user shenh062326 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3243#discussion_r20337096
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/Spillable.scala ---
@@ -105,7 +105,7 @@ private[spark] trait Spillable[C] {
*/
   @inline private def logSpillage(size: Long) {
 val threadId = Thread.currentThread().getId
-logInfo("Thread %d spilling in-memory map of %d MB to disk (%d time%s 
so far)"
-.format(threadId, size / (1024 * 1024), _spillCount, if 
(_spillCount > 1) "s" else ""))
+logInfo("Thread %d spilling in-memory map of %d B to disk (%d time%s 
so far)"
--- End diff --

Thanks Srowen,  change to Utils.bytesToString.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3259#discussion_r20336546
  
--- Diff: 
core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala ---
@@ -913,8 +918,10 @@ private[nio] class ConnectionManager(
   }
 }
 
+val timoutTaskHandle = ackTimeoutMonitor.newTimeout(timeoutTask, 
ackTimeout, TimeUnit.SECONDS)
--- End diff --

timout?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3259#issuecomment-62995861
  
LGTM aside from typo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4079] [CORE] Consolidates Errors if a C...

2014-11-13 Thread ksakellis

Github user ksakellis commented on the pull request:

https://github.com/apache/spark/pull/3119#issuecomment-62995656
  
@pwendell Can you please trigger the jenkins tests for this pr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3260#issuecomment-62995573
  
  [Test build #23343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23343/consoleFull)
 for   PR 3260 at commit 
[`9a5e171`](https://github.com/apache/spark/commit/9a5e17166c5f8c75f067846ec5f515db0857f1ea).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...

2014-11-13 Thread marmbrus

GitHub user marmbrus opened a pull request:

https://github.com/apache/spark/pull/3260

[SPARK-4394][SQL] Data Sources API Improvements

This PR adds two features to the data sources API:
 - Support for pushing down `IN` filters
 - The ability for relations to optionally provide information about their 
`sizeInBytes`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/marmbrus/spark sourcesImprovements

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3260.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3260


commit 2a04ab3deaa989738ef77b9e70dd00bba6ae4d1e
Author: Michael Armbrust 
Date:   2014-11-14T00:59:46Z

Simplify implementation of InSet.

commit 416f167cb58edc088c449ea65f327fe4f8ed9e74
Author: Michael Armbrust 
Date:   2014-11-14T01:00:36Z

Support for IN in data sources API.

commit 99c0e6b1672ed8ec6fb40d9f90f887592b7eac46
Author: Michael Armbrust 
Date:   2014-11-14T01:01:02Z

Add support for sizeInBytes.

commit 9a5e17166c5f8c75f067846ec5f515db0857f1ea
Author: Michael Armbrust 
Date:   2014-11-14T01:03:23Z

Use method instead of configuration directly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-62994962
  
  [Test build #23337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23337/consoleFull)
 for   PR 3193 at commit 
[`78bf997`](https://github.com/apache/spark/commit/78bf997f13c6f08129671a9d6a3484620d5b37a2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class RDDRangeSampler(RDDSamplerBase):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-62994968
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23337/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-62994910
  
  [Test build #519 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/519/consoleFull)
 for   PR 3193 at commit 
[`657de2d`](https://github.com/apache/spark/commit/657de2d8a536459157dfc535116428d7ce268297).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3259#issuecomment-62994235
  
  [Test build #23342 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23342/consoleFull)
 for   PR 3259 at commit 
[`3200c33`](https://github.com/apache/spark/commit/3200c33363d8daed187ecd10f7b5fc370d44f349).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3259#issuecomment-62994078
  
/cc @andrewor14 and @rxin for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...

2014-11-13 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/3259

[SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; 
use HashedWheelTimer

This patch is intended to fix a subtle memory leak in ConnectionManager's 
ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the 
message being sent and a cancelled TimerTask won't necessarily be 
garbage-collected until it's scheduled to run, so this caused huge buildups of 
messages that weren't garbage collected until their timeouts expired, leading 
to OOMs.

This patch addresses this problem by capturing only the message ID in the 
TimerTask instead of the whole message.  I've also modified this code to use 
Netty's HashedWheelTimer, whose performance characteristics should be better 
for this use-case.

Thanks to @cristianopris for narrowing down this issue!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark 
connection-manager-timeout-bugfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3259.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3259


commit f847dd4a4a8e7f92de879de9d5c9eb31743f8a26
Author: Josh Rosen 
Date:   2014-11-14T00:13:15Z

Don't capture entire message in ACK timeout task.

The old code caused memory leaks.

commit 3200c33363d8daed187ecd10f7b5fc370d44f349
Author: Josh Rosen 
Date:   2014-11-14T00:45:41Z

Use Netty HashedWheelTimer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor cleanup of comments, errors and ov...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3257#issuecomment-62993447
  
  [Test build #23336 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23336/consoleFull)
 for   PR 3257 at commit 
[`2fdf903`](https://github.com/apache/spark/commit/2fdf903d24d4c7320cbb2b76f592082bac321a0c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL] Minor cleanup of comments, errors and ov...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3257#issuecomment-62993455
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23336/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...

2014-11-13 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/2991#issuecomment-62993250
  
OK, I will, thanks a lot, greatly appreciated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3193#issuecomment-62992769
  
  [Test build #518 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/518/consoleFull)
 for   PR 3193 at commit 
[`657de2d`](https://github.com/apache/spark/commit/657de2d8a536459157dfc535116428d7ce268297).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class NettyBlockTransferService(conf: SparkConf, securityManager: 
SecurityManager, numCores: Int)`
  * `abstract class EdgeRDD[ED, VD](`
  * `abstract class VertexRDD[VD](`
  * `class RDDRangeSampler(RDDSamplerBase):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3258#issuecomment-62992058
  
  [Test build #23340 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23340/consoleFull)
 for   PR 3258 at commit 
[`15e9a98`](https://github.com/apache/spark/commit/15e9a98e75928915b9a2c2c1c02d88bba3756485).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4390][SQL] Handle NaN cast to decimal c...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3256#issuecomment-62991877
  
  [Test build #23341 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23341/consoleFull)
 for   PR 3256 at commit 
[`4c3ba46`](https://github.com/apache/spark/commit/4c3ba4617716bbc3bee95ff258ace2661b60f136).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...

2014-11-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3258#issuecomment-62991118
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23338/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...

2014-11-13 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3258#issuecomment-62991114
  
  [Test build #23338 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23338/consoleFull)
 for   PR 3258 at commit 
[`75afd39`](https://github.com/apache/spark/commit/75afd39ba2a034fb67792c2773ba53dd92e92a71).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 356 matches

Mail list logo