date:20141226

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3808#issuecomment-68131355
  
  [Test build #24832 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24832/consoleFull)
 for   PR 3808 at commit 
[`ec3c989`](https://github.com/apache/spark/commit/ec3c989efd453897e7fe5d4de01b3edefe21eb3e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

2014-12-26 Thread tigerquoll

GitHub user tigerquoll opened a pull request:

https://github.com/apache/spark/pull/3809

spark-core - [SPARK-4787] - Stop sparkcontext properly if a DAGScheduler 
init error occurs

[SPARK-4787] Stop SparkContext properly if an exception occurs during 
DAGscheduler initialization.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tigerquoll/spark SPARK-4787

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3809


commit 217257879fe7c98673caf14b980790498887581e
Author: Dale tigerqu...@outlook.com
Date:   2014-12-26T09:33:05Z

[SPARK-4787] Stop context properly if an exception occurs during 
DAGScheduler initialization.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68133474
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4972][MLlib] Updated the scala doc for ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3808#issuecomment-68134353
  
  [Test build #24832 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24832/consoleFull)
 for   PR 3808 at commit 
[`ec3c989`](https://github.com/apache/spark/commit/ec3c989efd453897e7fe5d4de01b3edefe21eb3e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4972][MLlib] Updated the scala doc for ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3808#issuecomment-68134354
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24832/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

2014-12-26 Thread YanTangZhai

GitHub user YanTangZhai opened a pull request:

https://github.com/apache/spark/pull/3810

[SPARK-4962] [CORE] Put TaskScheduler.start back in SparkContext to shorten 
cluster resources occupation period

When SparkContext object is instantiated, TaskScheduler is started and some 
resources are allocated from cluster. However, these
resources may be not used for the moment. For example, 
DAGScheduler.JobSubmitted is processing and so on. These resources are wasted in
this period. Thus, we want to put TaskScheduler.start back to shorten 
cluster resources occupation period specially for busy cluster.
TaskScheduler could be started just before running stages.
We could analyse and compare the resources occupation period before and 
after optimization.
TaskScheduler.start execution time: [time1__]
DAGScheduler.JobSubmitted (excluding HadoopRDD.getPartitions or 
TaskScheduler.start) execution time: [time2_]
HadoopRDD.getPartitions execution time: [time3___]
Stages execution time: [time4_]
The cluster resources occupation period before optimization is 
[time2_][time3___][time4_].
The cluster resources occupation period after optimization 
is[time3___][time4_].
In summary, the cluster resources
occupation period after optimization is less than before.
If HadoopRDD.getPartitions could be put forward (SPARK-4961), the period 
may be shorten more which is [time4_].
The resources saving is important for busy cluster.

The main purpose of this PR is to decrease resources waste for busy cluster.
For example, a process initializes a SparkContext instance, reads a few 
files from HDFS or many records from PostgreSQL, and then calls RDD's collect 
operation to submit a job.
When SparkContext is initialized, an app is submitted to cluster and some 
resources are hold by this app. 
These resources are not used really until the job is submitted by RDD's 
action.
The resources in the period from initialization to actual use could be 
considered wasteful.
If app is submitted when SparkContext is initialized, all of resources 
needed by the app may be granted before running job. 
Then the job could runs efficiently without resource constraint.
On the contrary, if app is submitted when job is submitted, resources 
needed by the app may be granted at different times. Then the job may run not 
so efficiently since some resources are applying.
Thus I use a configuration parameter spark.scheduler.app.slowstart (default 
false) to let user make tradeoffs between economy and efficiency.
There are 9 kinds of master URL and 6 kinds of SchedulerBackend.
LocalBackend and SimrSchedulerBackend don't need to put starting back since 
there is no difference.
SparkClusterSchedulerBackend (yarn-standalone or yarn-cluster) does not put 
starting back since the app should be submitted in advance by SparkSubmit.
CoarseMesosSchedulerBackend and MesosSchedulerBackend could put starting 
back.
YarnClientSchedulerBackend (yarn-client) could put starting back.
This PR puts TaskScheduler.start back only for yarn-client mode in the 
early.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/YanTangZhai/spark SPARK-4962

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3810


commit cdef539abc5d2d42d4661373939bdd52ca8ee8e6
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-06T13:07:08Z

Merge pull request #1 from apache/master

update

commit cbcba66ad77b96720e58f9d893e87ae5f13b2a95
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-08-20T13:14:08Z

Merge pull request #3 from apache/master

Update

commit 8a0010691b669495b4c327cf83124cabb7da1405
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-12T06:54:58Z

Merge pull request #6 from apache/master

Update

commit 03b62b043ab7fd39300677df61c3d93bb9beb9e3
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-09-16T12:03:22Z

Merge pull request #7 from apache/master

Update

commit 76d40277d51f709247df1d3734093bf2c047737d
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-10-20T12:52:22Z

Merge pull request #8 from apache/master

update

commit d26d98248a1a4d0eb15336726b6f44e05dd7a05a
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-11-04T09:00:31Z

Merge pull request #9 from apache/master

Update

commit e249846d9b7967ae52ec3df0fb09e42ffd911a8a
Author: YanTangZhai hakeemz...@tencent.com
Date:   2014-11-11T03:18:24Z

Merge pull request #10 from apache/master

Update

commit 6e643f81555d75ec8ef3eb57bf5ecb6520485588
Author: YanTangZhai hakeemz...@tencent.com
Date:

[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3810#issuecomment-68137843
  
  [Test build #24833 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24833/consoleFull)
 for   PR 3810 at commit 
[`05469de`](https://github.com/apache/spark/commit/05469de9f0482bce54a60161b9cb386a64173826).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/3811

[SPARK-4973][CORE] Local directory in the driver of client-mode continues 
remaining even if application finished when external shuffle is enabled

When we enables external shuffle service, local directories in the driver 
of client-mode continue remaining even if application has finished.
I think local directories for drivers should be deleted.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark SPARK-4973

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3811.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3811


commit d99718e85c0b97bddb0e7736a392536ede510c47
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-12-26T11:59:36Z

Fixed SparkSubmit.scala and DiskBlockManager.scala in order to delete local 
directories of the driver of local-mode when external shuffle service is enabled




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3811#issuecomment-68138582
  
  [Test build #24834 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24834/consoleFull)
 for   PR 3811 at commit 
[`d99718e`](https://github.com/apache/spark/commit/d99718e85c0b97bddb0e7736a392536ede510c47).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/3812

[Minor] Fix the value represented by spark.executor.id for the driver of 
local mode.

When we run application in local mode, the property `spark.executor.id` 
represents `driver` for the driver.
While we run in anything else mode, the property represents `driver` for 
the driver.

It's inconsistent.

This issue is minor so I didn't file this in JIRA.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark fix-driver-identifier

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3812


commit 4275663d875840bdc0c0da69707386a8b5eb1d3a
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-12-26T12:04:56Z

Fixed the value represented by spark.executor.id of local mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-68138840
  
  [Test build #24835 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24835/consoleFull)
 for   PR 3812 at commit 
[`4275663`](https://github.com/apache/spark/commit/4275663d875840bdc0c0da69707386a8b5eb1d3a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3046#issuecomment-68139015
  
  [Test build #24837 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24837/consoleFull)
 for   PR 3046 at commit 
[`41ef90e`](https://github.com/apache/spark/commit/41ef90e8ed25b21f1e5c689c478963c74577d81d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4576][SQL] Add concatenation operator

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3433#issuecomment-68139017
  
  [Test build #24836 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24836/consoleFull)
 for   PR 3433 at commit 
[`9b94d48`](https://github.com/apache/spark/commit/9b94d4832f670b5dea0e917654fbeb59450ed1d6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3810#issuecomment-68140334
  
  [Test build #24833 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24833/consoleFull)
 for   PR 3810 at commit 
[`05469de`](https://github.com/apache/spark/commit/05469de9f0482bce54a60161b9cb386a64173826).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4962] [CORE] Put TaskScheduler.start ba...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3810#issuecomment-68140337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24833/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3811#issuecomment-68140417
  
  [Test build #24834 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24834/consoleFull)
 for   PR 3811 at commit 
[`d99718e`](https://github.com/apache/spark/commit/d99718e85c0b97bddb0e7736a392536ede510c47).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3811#issuecomment-68140420
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24834/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-26 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68140510
  
Not only,  actually this PR cover optimizations as follows:
```
And/Or with same condition
a  a = a , a  a  a ... = a  
a || a = a , a || a || a ... = a

one And/Or with conditions that can be merged
a  2  a  2 = false, a  3  a  5 = a  5
a  2 || a = 2 = true, a  3 || a  5 = a  3

two And/Or with conditions that can be merged
(a  3  b  5) || a  2 = b  5 || a  2
(a  3 || b  5) || a  2 = true
(a  2  b  5)  a  3 = flase
(a  2 || b  5)  a  3 = b  5  a  3

more than two And/Or with common conditions
(a  b  c  ...) || (a  b  d  ...) || (a  b  e  ...) ... = 
a  b  ((c  ...) || (d  ...) || (e  ...) || ...)
(a || b || c || ...)  (a || b || d || ...)  (a || b || e || ...) ... = 
(a || b) || ((c || ...)  (f || ...)  (e || ...)  ...)
```
hi @liancheng, do you mind i refactory this and refer to your PR to cover 
all the cases above?  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4576][SQL] Add concatenation operator

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3433#issuecomment-68141157
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24836/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4576][SQL] Add concatenation operator

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3433#issuecomment-68141153
  
  [Test build #24836 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24836/consoleFull)
 for   PR 3433 at commit 
[`9b94d48`](https://github.com/apache/spark/commit/9b94d4832f670b5dea0e917654fbeb59450ed1d6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Concat(left: Expression, right: Expression) extends 
BinaryExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-68141394
  
  [Test build #24835 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24835/consoleFull)
 for   PR 3812 at commit 
[`4275663`](https://github.com/apache/spark/commit/4275663d875840bdc0c0da69707386a8b5eb1d3a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-68141397
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24835/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3046#issuecomment-68141516
  
  [Test build #24837 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24837/consoleFull)
 for   PR 3046 at commit 
[`41ef90e`](https://github.com/apache/spark/commit/41ef90e8ed25b21f1e5c689c478963c74577d81d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3046#issuecomment-68141518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24837/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

2014-12-26 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-68142138
  
Is there a functional change here? The value is now driver instead of 
driver. It sounds good to be consistent but I wonder if there is a reason for 
the dfference.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...

2014-12-26 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3811#issuecomment-68142275
  
Does this define a new system property just for deployment mode? This logic 
looks like it is applied even when external shuffle service is not enabled. Why 
is the driver behavior special here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Minor] Fix the value represented by spark.exe...

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3812#issuecomment-68143144
  
There is not a functional change for Spark itself but it's rather than for 
other systems associating with Spark, like monitoring systems. The property is 
used for metrics name so this issue can affects for users monitoring driver's 
metrics.
As you mentioned, this change doesn't affects Spark itself but I think, we 
should consider how features of Spark are used by user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4973][CORE] Local directory in the driv...

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/3811#issuecomment-68143249
  
If we run client-mode including local-mode, driver runs on client and 
executors doesn't run on client so  no one shared the local directories of the 
driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-26 Thread koeninger

Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-68149432

Hi @jerryshao

I'd politely ask that anyone with questions read at least KafkaRDD.scala
and the example usage linked from the jira ticket (it's only about 50
significant lines of code):

https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/TransactionalExample.scala

I'll try to address your points.

1. Yes, each RDD partition maps directly to a Kafka (topic, partition,
inclusive starting offset, exclusive ending offset)

2. It's a pull model, not a receiver push model. All the InputDStream
implementation is doing is checking the leaders' highest offsets and defining
an RDD based on that. When the RDD is run, its iterator makes a connection to
kafka and pulls the data. This is done because it's simpler, and because using
existing network receiver code would require dedicating 1 core per kafka
partition, which is unacceptable from an ops standpoint.

3. Yes. The fault tolerance model is that it should be safe for any or all
of the spark machines to be completely destroyed at any point in the job, and
the job should be able to be safely restarted. I don't think you can do better
than this. This is achieved because all important state, especially the
storage of offsets, are controlled by client code, not spark. In both the
transactional and idempotent client code approaches, offsets aren't stored
until data is stored, so restart should be safe.

Regarding your approach that you link, the problem there is (a) it's not a
part of the spark distribution so people won't know about it, and (b) it
assumes control of kafka offsets and storage in zookeeper, which makes it
impossible for client code to control exactly once semantics.

Regarding the possible semantic disconnect between spark streaming and
treating kafka as a durable store of data from the past (assuming that's what
you meant)... I agree there is a disconnect there. But it's a fundamental
problem with spark streaming in that it implicitly depends on now rather than
a time embedded in the data stream. I don't think we're fixing that with this
ticket.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-3655 GroupByKeyAndSortValues

2014-12-26 Thread koertkuipers

Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/3632#issuecomment-68150977
  
@markhamstra take a look now. 
i ignored the situation of K and V having same type, since i think it can 
be dealt with by using a simple wrapper (value) class for the Vs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3809#discussion_r22287446
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -329,8 +329,11 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   try {
 dagScheduler = new DAGScheduler(this)
   } catch {
-case e: Exception = throw
-  new SparkException(DAGScheduler cannot be initialized due to 
%s.format(e.getMessage))
+case e: Exception = {
+  stop()
+  throw
--- End diff --

Style nit: you can use string interpolation instead of String.format, which 
will allow the `new SparkException` to fit on the same line as `throw`:

```scala
throw new SparkException(DAGScheduler cannot be initialized due to 
${e.getMessage})
```

However, I'd prefer to call the two-argument constructor which takes the 
cause as second argument, since this will lead to more informative stacktraces:

```scala
throw new SparkException(Error while constructing DAGScheduler, e)
```





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68155076
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68155166
  
  [Test build #24838 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24838/consoleFull)
 for   PR 3809 at commit 
[`2172578`](https://github.com/apache/spark/commit/217257879fe7c98673caf14b980790498887581e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68155357
  
This is a nice fix.

Resource leaks when SparkContext's constructor throws exceptions have been 
a longstanding issue.  I first ran across the issue while adding logic to 
detect whether a SparkContext was already running when attempting to create a 
new one ([SPARK-4180](https://issues.apache.org/jira/browse/SPARK-4180)).  In 
that case, I ran into some issues because I wanted to effectively make the 
entire constructor synchronized on a static object, but this was hard because 
there wasn't an explicit constructor method.  We could have tried to wrap the 
entire implicit constructor in a try-finally block, but this would require us 
to re-organize a huge amount of code and change many `vals` into `vars`.  I had 
an alternative proposal to move the dependency-creation into the SparkContext 
companion object and pass in a SparkContextDependencies object into 
SparkContext's constructors, which would solve this issue more generally (but 
it's a much larger change).  See the PR description at #3121 for more deta
 ils.

Barring a big restructuring of SparkContext's constructor, though, small 
fixes like this are welcome.  Therefore, this looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68155681
  
By the way, I left a [comment over on 
JIRA](https://issues.apache.org/jira/browse/SPARK-4787?focusedCommentId=14259202page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14259202)
 about the scope of the SPARK-4787 JIRA.  If we merge this PR as-is, without 
adding more try-catches for other statements that could throw exceptions, then 
I think we should revise that JIRA to describe only the fix implemented here 
(error-catching for DAGScheduler errors) and should convert it into a subtask 
of SPARK-4180.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4971: fix typo in the comment

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3807#issuecomment-68155797
  
LGTM, so I'll merge this.

In the future, I wouldn't bother to file JIRA issues for super-small 
one-word documentation fixes like this, since the JIRA issue is effectively a 
duplicate of the PR itself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4971: fix typo in the comment

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3807


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68156011
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68156052
  
Super-minor process nit, but do you mind moving your comment into the PR 
description itself?  The PR description automatically becomes the commit 
message, so keeping it up-to-date means less work for committers when they 
merge your PRs since they don't have to fix up the message by hand.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68156186
  
  [Test build #24839 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24839/consoleFull)
 for   PR 3805 at commit 
[`41ede0e`](https://github.com/apache/spark/commit/41ede0ee67f77e09f2abe96c981167ed671e0504).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68156238
  
This class of issue could be a more general problem for our test-suites, 
since I think there are a number of places where we call things like `new 
SparkConf()` that might implicitly read defaults from the configuration file.

I wonder if there's a more general fix, such as using `Utils.isTesting` to 
bypass the defaults loading in unit tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68156344
  
Also, the PR / JIRA title is confusing; I can't really guess what this 
patch does based on the title, since fix an implicit bug could mean many 
different things.  A better title would be something like Do not read 
spark.executor.memory from spark-defaults.conf in SparkSubmitSuite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22287877
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala ---
@@ -233,6 +236,47 @@ class InputStreamsSuite extends TestSuiteBase with 
BeforeAndAfter {
 }
   }
 
+  def testBinaryRecordsStream() {
+var ssc: StreamingContext = null
+val testDir: File = null
+try {
+  val testDir = Utils.createTempDir()
+
+  Thread.sleep(1000)
+  // Set up the streaming context and input streams
+  val newConf = conf.clone.set(
+spark.streaming.clock, 
org.apache.spark.streaming.util.SystemClock)
--- End diff --

It looks like this is based on the FileInputStream test, which is known to 
be flaky.  I have a PR open which rewrites that test to not depend on 
SystemClock / Thread.sleep(): #3801.  Therefore, if we want to have this style 
of test, then this PR should block until my PR is merged so that it can use the 
new test utilities that I added.

Here's the relevant change from my PR: 
https://github.com/apache/spark/pull/3801/files#diff-4


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22287887
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala ---
@@ -233,6 +236,47 @@ class InputStreamsSuite extends TestSuiteBase with 
BeforeAndAfter {
 }
   }
 
+  def testBinaryRecordsStream() {
--- End diff --

Also, since this is only called from one place, I'd just inline this code 
in the `test(binary records stream)` function rather than defining a whole 
new function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22287941
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -373,6 +393,25 @@ class StreamingContext private[streaming] (
   }
 
   /**
+   * Create an input stream that monitors a Hadoop-compatible filesystem
+   * for new files and reads them as flat binary files, assuming a fixed 
length per record,
+   * generating one byte array per record. Files must be written to the 
monitored directory
+   * by moving them from another location within the same file system. 
File names
+   * starting with . are ignored.
+   * @param directory HDFS directory to monitor for new file
+   * @param recordLength length of each record in bytes
+   */
+  def binaryRecordsStream(
+  directory: String,
+  recordLength: Int): DStream[Array[Byte]] = {
+val conf = sc_.hadoopConfiguration
+conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, 
recordLength)
+val br = fileStream[LongWritable, BytesWritable, 
FixedLengthBinaryInputFormat](directory, conf)
+val data = br.map{ case (k, v) = v.getBytes}
--- End diff --

This is an subtly-incorrect usage of `getBytes`, since `getBytes` returns a 
padded byte array; you need to copy / slice out the subarray with the data 
using `v.getLength`.  see [HADOOP-6298: BytesWritable#getBytes is a bad name 
that leads to programming 
mistakes](https://issues.apache.org/jira/browse/HADOOP-6298) for more details.

We've hit this problem before in other parts of Spark:

- https://issues.apache.org/jira/browse/SPARK-3121
- https://issues.apache.org/jira/browse/SPARK-4901

Here's a PR which shows the correct usage: #2712


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22287961
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -373,6 +393,25 @@ class StreamingContext private[streaming] (
   }
 
   /**
+   * Create an input stream that monitors a Hadoop-compatible filesystem
+   * for new files and reads them as flat binary files, assuming a fixed 
length per record,
+   * generating one byte array per record. Files must be written to the 
monitored directory
+   * by moving them from another location within the same file system. 
File names
+   * starting with . are ignored.
+   * @param directory HDFS directory to monitor for new file
+   * @param recordLength length of each record in bytes
+   */
+  def binaryRecordsStream(
+  directory: String,
+  recordLength: Int): DStream[Array[Byte]] = {
+val conf = sc_.hadoopConfiguration
+conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, 
recordLength)
+val br = fileStream[LongWritable, BytesWritable, 
FixedLengthBinaryInputFormat](directory, conf)
+val data = br.map{ case (k, v) = v.getBytes}
--- End diff --

Actually, it looks like the same bug is present in the new 
`binaryRecords()` method in Spark core.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969][STREAMING][PYTHON] Add binaryReco...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3803#discussion_r22287997
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala ---
@@ -373,6 +393,25 @@ class StreamingContext private[streaming] (
   }
 
   /**
+   * Create an input stream that monitors a Hadoop-compatible filesystem
+   * for new files and reads them as flat binary files, assuming a fixed 
length per record,
+   * generating one byte array per record. Files must be written to the 
monitored directory
+   * by moving them from another location within the same file system. 
File names
+   * starting with . are ignored.
+   * @param directory HDFS directory to monitor for new file
+   * @param recordLength length of each record in bytes
+   */
+  def binaryRecordsStream(
+  directory: String,
+  recordLength: Int): DStream[Array[Byte]] = {
+val conf = sc_.hadoopConfiguration
+conf.setInt(FixedLengthBinaryInputFormat.RECORD_LENGTH_PROPERTY, 
recordLength)
+val br = fileStream[LongWritable, BytesWritable, 
FixedLengthBinaryInputFormat](directory, conf)
+val data = br.map{ case (k, v) = v.getBytes}
--- End diff --

Maybe it's not an issue since we're using FixedLengthBinaryInputFormat, but 
even if it isn't we should have a comment explaining why it's correct or a 
defensive check that `getBytes` returns an array of the expected length.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2759][CORE] Generic Binary File Support...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1658#discussion_r22288031
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -510,6 +510,52 @@ class SparkContext(config: SparkConf) extends Logging {
   minPartitions).setName(path)
   }
 
+
+  /**
+   * Get an RDD for a Hadoop-readable dataset as PortableDataStream for 
each file
+   * (useful for binary data)
+   *
+   *
+   * @param minPartitions A suggestion value of the minimal splitting 
number for input data.
+   *
+   * @note Small files are preferred, large file is also allowable, but 
may cause bad performance.
+   */
+  @DeveloperApi
+  def binaryFiles(path: String, minPartitions: Int = defaultMinPartitions):
+  RDD[(String, PortableDataStream)] = {
+val job = new NewHadoopJob(hadoopConfiguration)
+NewFileInputFormat.addInputPath(job, new Path(path))
+val updateConf = job.getConfiguration
+new BinaryFileRDD(
+  this,
+  classOf[StreamInputFormat],
+  classOf[String],
+  classOf[PortableDataStream],
+  updateConf,
+  minPartitions).setName(path)
+  }
+
+  /**
+   * Load data from a flat binary file, assuming each record is a set of 
numbers
+   * with the specified numerical format (see ByteBuffer), and the number 
of
+   * bytes per record is constant (see FixedLengthBinaryInputFormat)
+   *
+   * @param path Directory to the input data files
+   * @param recordLength The length at which to split the records
+   * @return An RDD of data with values, RDD[(Array[Byte])]
+   */
+  def binaryRecords(path: String, recordLength: Int,
+conf: Configuration = hadoopConfiguration): 
RDD[Array[Byte]] = {
+conf.setInt(recordLength,recordLength)
+val br = newAPIHadoopFile[LongWritable, BytesWritable, 
FixedLengthBinaryInputFormat](path,
+  classOf[FixedLengthBinaryInputFormat],
+  classOf[LongWritable],
+  classOf[BytesWritable],
+  conf=conf)
+val data = br.map{ case (k, v) = v.getBytes}
--- End diff --

It turns out that `getBytes` returns a padded byte array, so I think  you 
may need to copy / slice out the subarray with the data using `v.getLength`; 
see [HADOOP-6298: BytesWritable#getBytes is a bad name that leads to 
programming mistakes](https://issues.apache.org/jira/browse/HADOOP-6298) for 
more details.

Using `getBytes` without `getLength` has caused bugs in Spark in the past: 
#2712.

Is the use of `getBytes` in this patch a bug?  Or is it somehow safe due to 
our use of FixedLengthBinaryInputFormat?  If it is somehow safe, we should have 
a comment which explains this so that readers who know about the `getBytes` 
issue aren't confused (or better yet, an `assert` that `getBytes` returns an 
array of the expected length).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68158154
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68158247
  
  [Test build #24838 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24838/consoleFull)
 for   PR 3809 at commit 
[`2172578`](https://github.com/apache/spark/commit/217257879fe7c98673caf14b980790498887581e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-core - [SPARK-4787] - Stop sparkcontext ...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3809#issuecomment-68158250
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24838/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68158356
  
  [Test build #24840 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24840/consoleFull)
 for   PR 3707 at commit 
[`d2d41b6`](https://github.com/apache/spark/commit/d2d41b6f74aa8620e7937e6c039e11542a73698c).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68158618
  
One small correctness question (around quoting) but looks good to me. I can 
merge this later today and fix it manually if @brennonyork doesn't get around 
to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

2014-12-26 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68158971
  
@brennonyork Does this handle relative paths passed to Maven correctly (if 
that's a valid potential use case)? We had this problem with the `spark-ec2` 
script which was caused by the script [changing the working directory on the 
user](https://github.com/apache/spark/pull/2988).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68159278
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24839/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68159276
  
  [Test build #24839 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24839/consoleFull)
 for   PR 3805 at commit 
[`41ede0e`](https://github.com/apache/spark/commit/41ede0ee67f77e09f2abe96c981167ed671e0504).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...

2014-12-26 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/2348#issuecomment-68159339
  
Clickable link for the lazy: [Spark Packages](http://spark-packages.org/)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68161145
  
This will handle relative directories just fine. The last portion of this 
script changes the directory back to the `cwd` where the user was calling from 
so this isn't an issue :)



-Original Message-
From: Nicholas Chammas 
[notificati...@github.commailto:notificati...@github.com]
Sent: Friday, December 26, 2014 04:30 PM Eastern Standard Time
To: apache/spark
Cc: York, Brennon
Subject: Re: [spark] [SPARK-4501][Core] - Create build/mvn to automatically 
download maven/zinc/scalac (#3707)


@brennonyorkhttps://github.com/brennonyork Does this handle relative 
paths passed to Maven correctly (if that's a valid potential use case)? We had 
this problem with the spark-ec2 script which was caused by the script changing 
the working directory on the userhttps://github.com/apache/spark/pull/2988.

â
Reply to this email directly or view it on 
GitHubhttps://github.com/apache/spark/pull/3707#issuecomment-68158971.


The information contained in this e-mail is confidential and/or proprietary 
to Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user brennonyork commented on a diff in the pull request:

https://github.com/apache/spark/pull/3707#discussion_r22289411
  
--- Diff: sbt/sbt ---
@@ -1,111 +1,9 @@
-#!/usr/bin/env bash
+#!/bin/bash
 
-# When creating new tests for Spark SQL Hive, the HADOOP_CLASSPATH must 
contain the hive jars so
-# that we can run Hive to generate the golden answer.  This is not 
required for normal development
-# or testing.
-for i in $HIVE_HOME/lib/*
-do HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$i
-done
-export HADOOP_CLASSPATH
+# Determine the current working directory
+_DIR=$( cd $( dirname ${BASH_SOURCE[0]} )  pwd )
 
-realpath () {
-(
-  TARGET_FILE=$1
+echo WARNING: The sbt/sbt script has been deprecated in place of 
build/sbt. 2
+echo  Please change all references to point to the new location. 
2
 
-  cd $(dirname $TARGET_FILE)
-  TARGET_FILE=$(basename $TARGET_FILE)
-
-  COUNT=0
-  while [ -L $TARGET_FILE -a $COUNT -lt 100 ]
-  do
-  TARGET_FILE=$(readlink $TARGET_FILE)
-  cd $(dirname $TARGET_FILE)
-  TARGET_FILE=$(basename $TARGET_FILE)
-  COUNT=$(($COUNT + 1))
-  done
-
-  echo $(pwd -P)/$TARGET_FILE
-)
-}
-
-. $(dirname $(realpath $0))/sbt-launch-lib.bash
-
-
-declare -r noshare_opts=-Dsbt.global.base=project/.sbtboot 
-Dsbt.boot.directory=project/.boot -Dsbt.ivy.home=project/.ivy
-declare -r sbt_opts_file=.sbtopts
-declare -r etc_sbt_opts_file=/etc/sbt/sbtopts
-
-usage() {
- cat EOM
-Usage: $script_name [options]
-
-  -h | -help print this message
-  -v | -verbose  this runner is chattier
-  -d | -debugset sbt log level to debug
-  -no-colors disable ANSI color codes
-  -sbt-createstart sbt even if current directory contains no sbt 
project
-  -sbt-dir   path  path to global settings/plugins directory (default: 
~/.sbt)
-  -sbt-boot  path  path to shared boot directory (default: ~/.sbt/boot 
in 0.11 series)
-  -ivy   path  path to local Ivy repository (default: ~/.ivy2)
-  -meminteger  set memory options (default: $sbt_mem, which is 
$(get_mem_opts $sbt_mem))
-  -no-share  use all local caches; no sharing
-  -no-global uses global caches, but does not use global ~/.sbt 
directory.
-  -jvm-debug port  Turn on JVM debugging, open at the given port.
-  -batch Disable interactive mode
-
-  # sbt version (default: from project/build.properties if present, else 
latest release)
-  -sbt-version  version   use the specified version of sbt
-  -sbt-jar  path  use the specified jar as the sbt launcher
-  -sbt-rc   use an RC version of sbt
-  -sbt-snapshot use a snapshot version of sbt
-
-  # java version (default: java from PATH, currently $(java -version 21 
| grep version))
-  -java-home path alternate JAVA_HOME
-
-  # jvm options and output control
-  JAVA_OPTS  environment variable, if unset uses $java_opts
-  SBT_OPTS   environment variable, if unset uses 
$default_sbt_opts
-  .sbtopts   if this file exists in the current directory, it is
- prepended to the runner args
-  /etc/sbt/sbtopts   if this file exists, it is prepended to the runner 
args
-  -Dkey=val  pass -Dkey=val directly to the java runtime
-  -J-X   pass option -X directly to the java runtime
- (-J is stripped)
-  -S-X   add -X to sbt's scalacOptions (-S is stripped)
-  -PmavenProfilesEnable a maven profile for the build.
-
-In the case of duplicated or conflicting options, the order above
-shows precedence: JAVA_OPTS lowest, command line options highest.
-EOM
-}
-
-process_my_args () {
-  while [[ $# -gt 0 ]]; do
-case $1 in
- -no-colors) addJava -Dsbt.log.noformat=true  shift ;;
-  -no-share) addJava $noshare_opts  shift ;;
- -no-global) addJava -Dsbt.global.base=$(pwd)/project/.sbtboot  
shift ;;
-  -sbt-boot) require_arg path $1 $2  addJava 
-Dsbt.boot.directory=$2  shift 2 ;;
-   -sbt-dir) require_arg path $1 $2  addJava 
-Dsbt.global.base=$2  shift 2 ;;
- -debug-inc) addJava -Dxsbt.inc.debug=true  shift ;;
- -batch) exec /dev/null  shift ;;
-
--sbt-create) sbt_create=true  shift ;;
-
-  *) addResidual $1  shift ;;
-esac
-  done
-
-  # Now, ensure sbt version is used.
-  [[ ${sbt_version}XXX != XXX ]]  addJava 
-Dsbt.version=$sbt_version
-}
-
-loadConfigFile() {
-  cat $1 | sed '/^\#/d'
-}
-
-# if sbtopts files exist, prepend their contents to $@ so it can be 
processed by this runner
-[[ -f

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68161775
  
  [Test build #24840 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24840/consoleFull)
 for   PR 3707 at commit 
[`d2d41b6`](https://github.com/apache/spark/commit/d2d41b6f74aa8620e7937e6c039e11542a73698c).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68161776
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24840/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3916] [Streaming] discover new appended...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2806#issuecomment-68163636
  
There has been significant refactoring done in the FileInputStream. Can you 
update the PR accordingly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3916] [Streaming] discover new appended...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2806#issuecomment-68163696
  
Also, I took a quick look at the PR. Its seems a little complicated to 
understand just by looking at the code, so could you write a short design doc 
(or update the PR description) on the high-level technique used to implement 
this. It does not have to be very detailed, just enough for any one understand 
the logic and then verify it in the code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3754 spark streaming file system api cal...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2703#discussion_r22290108
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala
 ---
@@ -250,19 +250,19 @@ class JavaStreamingContext(val ssc: StreamingContext) 
extends Closeable {
* Files must be written to the monitored directory by moving them 
from another
* location within the same file system. File names starting with . are 
ignored.
* @param directory HDFS directory to monitor for new file
-   * @tparam K Key type for reading HDFS file
-   * @tparam V Value type for reading HDFS file
-   * @tparam F Input format for reading HDFS file
+   * @param inputFormatClass Input format for reading HDFS file
+   * @param keyClass Key type for reading HDFS file
+   * @param valueClass Value type for reading HDFS file
*/
   def fileStream[K, V, F : NewInputFormat[K, V]](
-  directory: String): JavaPairInputDStream[K, V] = {
-implicit val cmk: ClassTag[K] =
-  implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[K]]
-implicit val cmv: ClassTag[V] =
-  implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[V]]
-implicit val cmf: ClassTag[F] =
-  implicitly[ClassTag[AnyRef]].asInstanceOf[ClassTag[F]]
-ssc.fileStream[K, V, F](directory)
+directory: String,
+inputFormatClass: Class[F],
+keyClass: Class[K],
+valueClass: Class[V], newFilesOnly: Boolean = true): 
JavaPairInputDStream[K, V] = {
--- End diff --

Correction on this comment. newFilesOnly should be exposed as it is exposed 
in the Scala api. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3754 spark streaming file system api cal...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2703#discussion_r22290121
  
--- Diff: 
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java ---
@@ -1703,6 +1710,65 @@ public void testTextFileStream() {
 JavaDStreamString test = ssc.textFileStream(/tmp/foo);
   }
 
+
+  @Test
+  public void testFileStream() throws Exception {
+// Disable manual clock as FileInputDStream does not work with manual 
clock
+System.setProperty(spark.streaming.clock, 
org.apache.spark.streaming.util.SystemClock);
+ssc = new JavaStreamingContext(local[2], test, new Duration(1000));
+ssc.checkpoint(checkpoint);
+// Set up some sequence files for streaming to read in
+ListTuple2Long, Integer test_input = new ArrayListTuple2Long, 
Integer ();
+test_input.add(new Tuple2(1L, 123456));
+test_input.add(new Tuple2(2L, 123456));
+JavaPairRDDLong, Integer rdd = ssc.sc().parallelizePairs(test_input);
+File tempDir = Files.createTempDir();
+JavaPairRDDLongWritable, IntWritable saveable = rdd.mapToPair(
+  new PairFunctionTuple2Long, Integer, LongWritable, IntWritable() 
{
+public Tuple2LongWritable, IntWritable call(Tuple2Long, 
Integer record) {
+  return new Tuple2(new LongWritable(record._1), new 
IntWritable(record._2));
+}});
+saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/1/,
+LongWritable.class, IntWritable.class,
+SequenceFileOutputFormat.class);
+saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/2/,
+LongWritable.class, IntWritable.class,
+SequenceFileOutputFormat.class);
+
+// Construct a file stream from the above saved data
+JavaPairDStreamLongWritable, IntWritable testRaw = ssc.fileStream(
+  tempDir.getAbsolutePath() + / , SequenceFileInputFormat.class, 
LongWritable.class,
+  IntWritable.class, false);
+JavaPairDStreamLong, Integer test = testRaw.mapToPair(
+  new PairFunctionTuple2LongWritable, IntWritable, Long, Integer() 
{
+public Tuple2Long, Integer call(Tuple2LongWritable, 
IntWritable input) {
+  return new Tuple2(input._1().get(), input._2().get());
+}
+  });
+final AccumulatorInteger elem = ssc.sc().intAccumulator(0);
--- End diff --

Why is it not possible to just call rdd.count() and add up the counts in a 
global counter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark 3754 spark streaming file system api cal...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/2703#discussion_r22290126
  
--- Diff: 
streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java ---
@@ -1703,6 +1710,65 @@ public void testTextFileStream() {
 JavaDStreamString test = ssc.textFileStream(/tmp/foo);
   }
 
+
+  @Test
+  public void testFileStream() throws Exception {
+// Disable manual clock as FileInputDStream does not work with manual 
clock
+System.setProperty(spark.streaming.clock, 
org.apache.spark.streaming.util.SystemClock);
+ssc = new JavaStreamingContext(local[2], test, new Duration(1000));
+ssc.checkpoint(checkpoint);
+// Set up some sequence files for streaming to read in
+ListTuple2Long, Integer test_input = new ArrayListTuple2Long, 
Integer ();
+test_input.add(new Tuple2(1L, 123456));
+test_input.add(new Tuple2(2L, 123456));
+JavaPairRDDLong, Integer rdd = ssc.sc().parallelizePairs(test_input);
+File tempDir = Files.createTempDir();
+JavaPairRDDLongWritable, IntWritable saveable = rdd.mapToPair(
+  new PairFunctionTuple2Long, Integer, LongWritable, IntWritable() 
{
+public Tuple2LongWritable, IntWritable call(Tuple2Long, 
Integer record) {
+  return new Tuple2(new LongWritable(record._1), new 
IntWritable(record._2));
+}});
+saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/1/,
+LongWritable.class, IntWritable.class,
+SequenceFileOutputFormat.class);
+saveable.saveAsNewAPIHadoopFile(tempDir.getAbsolutePath()+/2/,
+LongWritable.class, IntWritable.class,
+SequenceFileOutputFormat.class);
+
+// Construct a file stream from the above saved data
+JavaPairDStreamLongWritable, IntWritable testRaw = ssc.fileStream(
+  tempDir.getAbsolutePath() + / , SequenceFileInputFormat.class, 
LongWritable.class,
+  IntWritable.class, false);
+JavaPairDStreamLong, Integer test = testRaw.mapToPair(
+  new PairFunctionTuple2LongWritable, IntWritable, Long, Integer() 
{
+public Tuple2Long, Integer call(Tuple2LongWritable, 
IntWritable input) {
+  return new Tuple2(input._1().get(), input._2().get());
+}
+  });
+final AccumulatorInteger elem = ssc.sc().intAccumulator(0);
+final AccumulatorInteger total = ssc.sc().intAccumulator(0);
+final AccumulatorInteger calls = ssc.sc().intAccumulator(0);
+test.foreachRDD(new FunctionJavaPairRDDLong, Integer, Void() {
+public Void call(JavaPairRDDLong, Integer rdd) {
+  rdd.foreach(new VoidFunctionTuple2Long, Integer() {
+  public void call(Tuple2Long, Integer e) {
+if (e._1() == 1l) {
+  elem.add(1);
+}
+total.add(1);
+  }
+});
+  calls.add(1);
+  return null;
+}
+  });
+ssc.start();
+Thread.sleep(5000);
--- End diff --

Could you make this something like a 
[`eventually`](http://doc.scalatest.org/1.8/org/scalatest/concurrent/Eventually.html)
 block in ScalaTest 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4974]: Prevent Circular dependency.

2014-12-26 Thread matt2000

GitHub user matt2000 opened a pull request:

https://github.com/apache/spark/pull/3813

[SPARK-4974]: Prevent Circular dependency.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/matt2000/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3813


commit 9e58c96e0904aa214ac5669172b475cfedb65159
Author: Matt Chapman m...@ninjitsuweb.com
Date:   2014-12-27T00:50:25Z

[SPARK-4974]: Prevent Circular dependency.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4974]: Prevent Circular dependency.

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3813#issuecomment-68164558
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68164728
  
@nchammas I spoke too soon earlier regarding it correctly handling relative 
paths. I fixed it and is now `pwd`-preserving. @pwendell I also fixed the 
improper quoting issue in `sbt/sbt`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4616][Core] - SPARK_CONF_DIR is not eff...

Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/3559#issuecomment-68165117
  
@andrewor14 @JoshRosen wondering what should be done with this issue, 
thoughts on my comments above??


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4974]: Prevent Circular dependency.

2014-12-26 Thread matt2000

Github user matt2000 commented on the pull request:

https://github.com/apache/spark/pull/3813#issuecomment-68165298
  
This is not the right fix. Still working on it...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...

2014-12-26 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/3426#issuecomment-68165724
  
Heh @JoshRosen @sryza  , should this patch include a serialVersionUID 
attribute on the classes to be serialized to make sure compiler quirks don't 
cause different UIDs to be generated for the classes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3426#issuecomment-68166190
  
Aren't default serialVersionUIDs generated in a consistent way across all 
JVMs because the algorithm for generating them is part of the Java spec?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3426#issuecomment-68166194
  
Err, across all compilers?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4694]Fix HiveThriftServer2 cann't stop ...

2014-12-26 Thread SaintBacchus

Github user SaintBacchus commented on the pull request:

https://github.com/apache/spark/pull/3576#issuecomment-68166365
  
Hi, @marmbrus @vanzin this problem also had influence to branch-1.2. Can we 
need to fix it in branch-1.2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4567. Make SparkJobInfo and SparkStageIn...

2014-12-26 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/3426#issuecomment-68166415
  

http://stackoverflow.com/questions/285793/what-is-a-serialversionuid-and-why-should-i-use-it
 seems to be a good summary of the pros and cons of this approach


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-26 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68169507
  
Actually I'd highly suggest you breaking this PR into at least two self 
contained PRs, which can be much easier to review and merge. Rule sets 1 and 4 
can be merged into one PR, rule sets 2 and 3 into another. Maybe we can remove 
rules 2 and 3 from this PR after your refactoring and get rule sets 1 and 4 
merged first (I realized #3784 doesn't cover all rules in set 4, because the 
second rule in set 4 doesn't help optimizing cartesian products).

The reason why I'm hesitant to include rule sets 2 and 3 is that, for now I 
don't see a sound yet concise implementation without introducing extra 
dependencies. Although I proposed the Spark `Interval` solution, I'd rather not 
introduce Spire. On the other hand, rule sets 1 and 4 have been proven to be 
both useful and easy to implement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-26 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68169549
  
Hey @scwf, I've posted my reply in #3778, so lets discuss these rules there 
to prevent distraction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-26 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/3778#issuecomment-68169633
  
Would like to add that the solution based on Spire `Interval` I posted 
above may suffer from floating point precision issue. Thus we might want to 
cast all integral comparisons to `Interval[Long]` and all fractional 
comparisons to `Interval[Double]` to fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-68169801
  
  [Test build #24841 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24841/consoleFull)
 for   PR 1290 at commit 
[`9fb76ba`](https://github.com/apache/spark/commit/9fb76badb0222fbfec6886152477bef76dc2eef8).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68170254
  
Jenkins, test this please. LGTM pending tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4501][Core] - Create build/mvn to autom...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3707#issuecomment-68170272
  
  [Test build #24842 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24842/consoleFull)
 for   PR 3707 at commit 
[`0e5a0e4`](https://github.com/apache/spark/commit/0e5a0e4345c6d1fe466ac574c961e690de2e9744).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4925][SQL] Publish Spark SQL hive-thrif...

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3766#discussion_r22291358
  
--- Diff: pom.xml ---
@@ -97,6 +97,7 @@
 modulesql/catalyst/module
 modulesql/core/module
 modulesql/hive/module
+modulesql/hive-thriftserver/module
--- End diff --

This should be removed - we only want this enabled with the 
`-Phive-thriftserver` profile. We always enable that profile when publishing 
artifacts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4925][SQL] Publish Spark SQL hive-thrif...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3766#issuecomment-68170290
  
Okay - makes sense. There is one incorrect change in here, but once that's 
removed we can merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4598] use pagination to show tasktable

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3456#issuecomment-68170532
  
Let's close this issue. This breaks global pagination which means it can't 
be merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3148] Update global variables of HttpBr...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2059


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4598] use pagination to show tasktable

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3456


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add some shuffle configurations in doc

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2031


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-2803: add Kafka stream feature in accord...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1602


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2348


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4817[STREAMING]Print the specified numbe...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3662


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [https://issues.apache.org/jira/browse/SPARK-4...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2633


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Added support for accessing secured HDFS

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/265


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3046#issuecomment-68170894
  
Looks good - I'm going to merge this with a slight modification (adding a 
comment to explain whats going on).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3787][BUILD] Assembly jar name is wrong...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3046


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3955] Different versions between jackso...

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3716#issuecomment-68171007
  
Yeah this looks good - thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3955] Different versions between jackso...

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3716


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-68171181
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24841/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...