Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/820
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43581291
@tdas CheckpointRDD is not properly cleaned.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43581755
@mateiz Why the checkpoint data must be written to the file system?.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43583620
@mateiz It is not necessary to write it in the file system.After all,
there is no other RDD in reading it.I think it should be put checkpoint data
into blockManager, so
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43589674
[The
code](https://github.com/witgo/spark/commit/6d7f2408a40bf4bb2889bf66fa61bced782cdefc#diff-2b593e0b4bd6eddab37f04968baa826c)
will make the checkpoint directory larger
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/811
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43608181
@mateiz @mengxr
I added a new operation `cachePoint` of RDD
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43656940
Another [solution](https://github.com/witgo/spark/compare/cachePoint).
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43790944
@mateiz, @mengxr
I am using [the code](https://github.com/witgo/spark/compare/cachePoint) to
test ALS.
A brief description of the test:
| Item
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-43840745
@tdas
You're right. the code breaks the fault-tolerance properties of RDDs.
The perfect solution is the automatic cleanup and rebuilding shuffle data.
---
If
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/855
Automatically cleanup checkpoint date
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark cleanup_checkpoint_date
Alternatively you can
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/855#issuecomment-43969051
@tdas
Optional? Default is off?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/855#issuecomment-43971489
@mridulm @tdas
The code has been updated.
Now, automatically clean up checkpoint data is optional
---
If your project is set up for it, you can reply to this
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-44122991
I am using [the
code](https://github.com/witgo/spark/compare/cleanup_checkpoint_date_als) to
test ALS.
A brief description of the test:
| Item | Description
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/884
Fix scalastyle warnings in yarn alpha
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark scalastyle
Alternatively you can review and
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/889#issuecomment-44235434
spark-hive => commons-codec 1.4
spark-sql => commons-codec 1.5
```
[INFO]
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/894
[SPARK-1930] Container memory beyond limit, were killed
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK-1930
Alternatively you
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/894#discussion_r13131123
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
---
@@ -90,6 +90,12 @@ private[yarn] class YarnAllocationHandler
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/894#issuecomment-44424182
I agree with @sryza .Spark automatically handle these better. Of course, we
can allow users to manually specify the special value.
---
If your project is set up for it
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/907#issuecomment-44487759
@colorant
This is a big changes. Can you explain this change reason?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/786#issuecomment-44499013
@pwendell
Do you have time to review this PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/655#discussion_r13227671
--- Diff:
yarn/stable/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala
---
@@ -105,278 +96,222 @@ private[yarn] class
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/921
In some cases, yarn does not automatically restart the container
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark allocateExecutors
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/921#issuecomment-44719589
@sryza
When `yarnAllocator.getNumExecutorsFailed` return value is greater than
zero .
`yarnAllocator.getNumExecutorsRunning < args.numExecutors` is true fore
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/894#discussion_r13259895
--- Diff:
yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ExecutorLauncher.scala
---
@@ -92,21 +92,22 @@ class ExecutorLauncher(args
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/929
Improve ALS algorithm resource usage
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark improve_als
Alternatively you can review and
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/828
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/828#issuecomment-44742037
This solution is not perfect. temporarily close this. The new #929 .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/894#issuecomment-4410
@mridulm
The following code in line with your thoughts?
https://github.com/witgo/spark/compare/SPARK-1930_different
---
If your project is set up for it, you can
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/940
update breeze to version 0.8.1
`breeze 0.8.1` dependent on `scala-logging-slf4j 2.1.1` The relevant code
on #332
You can merge this pull request into a Git repository by running:
$ git pull
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/940#issuecomment-44911857
@markhamstra , `breeze 0.7 ` does not support `scala 2.11` .
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/5528#issuecomment-96947031
@srowen This PR seems to have a bug in yarn-client:
```
HTTP ERROR 405
Problem accessing /proxy/application_1429108701044_0316/stages/stage/kill
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/5528#issuecomment-97126406
The hadoop version of my test cluster is `2.3.0-cdh5.0.1`. I'm not sure,
tomorrow I'll test what you said.
---
If your project is set up for it, you can rep
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/5528#issuecomment-97298740
`curl -d "id=3&terminate=true" "http://host:4040/stages/stage/kill/"` does
not work. There are other better way?
---
If your project is set up
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/5528#issuecomment-97619733
No, get blanks. but
`curl -d "id=3&terminate=true"
"http://host:9082/proxy/application_1429108701044_0377/stages/stage/kill/"` get
a 405
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/1482#discussion_r29440041
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -280,13 +280,18 @@ private[spark] class Executor(
m
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/5837#issuecomment-98146620
The kill link work in yarn-client, LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/14662
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/14751
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/14664
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/14647
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/14658
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo commented on the issue:
https://github.com/apache/spark/pull/14311
@rxin @ericl
This PR may cause the following code to throw an exception
```scala
private def getRemoteValues(blockId: BlockId): Option[BlockResult] = {
getRemoteBytes
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/14977
[Test Only][not ready for review][SPARK-6235][CORE]Address various 2G limits
## What changes were proposed in this pull request?
### Design
Setup for eliminating the various 2G
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/14977
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/14995
[Test Only][not ready for review][SPARK-6235][CORE]Address various 2G
limits
## What changes were proposed in this pull request?
### motivation
The various 2G limit in Spark
Github user witgo commented on the issue:
https://github.com/apache/spark/pull/14995
retest please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user witgo commented on the issue:
https://github.com/apache/spark/pull/14995
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user witgo commented on the issue:
https://github.com/apache/spark/pull/14995
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/1482
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/11041#issuecomment-184034549
@srowen 0.8.0 is the latest.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/11041#issuecomment-184175637
@srowen I've run some simple spark SQL cases, and it doesn't seem to have
any issues.
---
If your project is set up for it, you can reply to this emai
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/8520
[SPARK-10350] [Minor] [Doc] Fix SQL Programming Guide
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK-10350
Alternatively you
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/8467#discussion_r38283165
--- Diff: docs/sql-programming-guide.md ---
@@ -1371,6 +1380,26 @@ Configuration of Parquet can be done using the
`setConf` method on `SQLContext
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1482#issuecomment-136971893
I think it is necessary to merge the PR into master.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/1208
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user witgo reopened a pull request:
https://github.com/apache/spark/pull/1208
SPARK-1470: Use the scala-logging wrapper instead of the directly sfl4j api
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/1208
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1369
Use the scala-logging wrapper instead of the directly sfl4j api
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK-1470_new
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1369#issuecomment-48696680
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1369#issuecomment-48708268
#332 can't automatic test .
#1208 was messing up and I do not know how to solve . :sweat:
---
If your project is set up for it, you can reply to this email and
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/332#issuecomment-48708675
It can't automatic test. I submit a new PR #1369.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/1256#discussion_r14850885
--- Diff:
core/src/main/scala/org/apache/spark/deploy/master/MasterArguments.scala ---
@@ -38,19 +39,24 @@ private[spark] class MasterArguments(args:
Array
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1387
[WIP]When the executor is thrown OutOfMemoryError exception driver run
garbage collection
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-48841019
Now, `SparkContext.cleaner` without considering the executor memory usage.
This will cause the spark to fail in the shortage of memory.
---
If your project is set up for
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-48841151
@srowen
[Executor.scala#L253](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L253)
handle exceptions. But the
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1393#issuecomment-48843123
The overall increase how much memory? Have a detailed contrast?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1403
[WIP][SQL] By default does not run hive compatibility tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark hive_compatibility
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1404
Remove "NOTE: SPARK_YARN is deprecated, please use -Pyarn flag"
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1404#issuecomment-48981721
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-48985219
I agree with your point.
But when a memory overflow exception is thrown .Error is the Spark given:
```
org.apache.spark.SparkException: Job aborted due to stage
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-48985713
```
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill %p"
# Executing /bin/sh -c "kill 44942"...
14/
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1112#issuecomment-49045150
@tgravescs The code has been submitted. Because I don't have the hadoop
0.23.x cluster, the code no strict test.
---
If your project is set up for it, you can rep
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1404#issuecomment-49048633
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49061792
`SparkContext.cleaner` will clean up no reference RDD, shuffle and
broadcast.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49064468
Explicitly clear the means to keep all the reference object, for Java
programmers ,it is very unfriendly.
---
If your project is set up for it, you can reply to this
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49067472
Yes , `System.gc()` is just advice, may not really free resources. But RDD
no close method,can only be cleared by `ContextCleaner`
---
If your project is set up for it
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49069644
This involves a bug https://issues.apache.org/jira/browse/SPARK-2491 .
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49072476
Yes this solution is not perfect. I have been thinking about this problem.
BTW the `runGC ` method run GC and make sure it actually has run.
reference
https
Github user witgo commented on a diff in the pull request:
https://github.com/apache/spark/pull/1387#discussion_r14953652
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskEventListener.scala ---
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49074742
I'm sorry, my English is poor. The problem now is we do not have a reliable
solution to the RDD is cleared. Close this first?
---
If your project is set up for it
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49075444
`runGC` method's main problem is likely to run for a long time and still
didn't work.
---
If your project is set up for it, you can reply to this email and
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49076040
In my tests, `runGC` method is normally working in jdk7_45.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1387#issuecomment-49077845
Ok, tomorrow or the day after tomorrow I try it on the way you said. I only
tested the default gc configuration and I will test the other.
---
If your project is set up
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1330#issuecomment-49117596
As a result of #772. The master has fixed this problem. But we should
remove this line `-language:postfixOps` in
[pom.xml#L807](https://github.com/apache/spark/blob
Github user witgo closed the pull request at:
https://github.com/apache/spark/pull/1022
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
GitHub user witgo reopened a pull request:
https://github.com/apache/spark/pull/1022
SPARK-1719: spark.*.extraLibraryPath isn't applied on yarn
Fix: spark.executor.extraLibraryPath isn't applied on yarn
You can merge this pull request into a Git repository by running:
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1482
SPARK-2491: Fix When an OOM is thrown,the executor does not stop properly.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK-2491
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1409#issuecomment-49513420
@aarondav @pwendell
In my tests, it seems that there are still a deadlock.
To find a possible reason this here [Executor.scala#L189]
(https://github.com
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1501#issuecomment-49546564
cc @tgravescs
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1501
[YARN]In some cases, pages display incorrect in WebUI
The issue is caused by #1112 .
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1501#issuecomment-49549682
Jenkins, test this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1511
Fix NPE for JsonProtocol
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark JsonProtocol
Alternatively you can review and apply these
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1180#issuecomment-49828369
@tgravescs Done.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1546#issuecomment-49852813
[HiveFromSpark](https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/hive/HiveFromSpark.scala#L22)
class is dependent on the
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1180#issuecomment-49874821
If we do not add this, When spark has failed in the yarn.SparkContext's
progress will be hang.
---
If your project is set up for it, you can reply to this emai
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1180#issuecomment-49881861
A little error repair at once.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1180#issuecomment-49886617
Done
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user witgo commented on the pull request:
https://github.com/apache/spark/pull/1511#issuecomment-49959000
@mateiz Done.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user witgo opened a pull request:
https://github.com/apache/spark/pull/1565
Build should not run hive tests by default.
cc @pwendell @ScrapCodes
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/witgo/spark SPARK-2484
701 - 800 of 916 matches
Mail list logo