Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/21320
backported to 2.3.2 just in case somebody needs it.
https://github.com/Gauravshah/spark/tree/branch-2.3_SPARK-4502 Thanks @mallman
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/21320
@mallman any way I can help pull in rest of the changes from your original
PR (https://github.com/apache/spark/pull/16578) for next release
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/21320
Or @ajacques can open a PR to Mallman's branch and he can merge it. Makes
it less work work for him
---
-
To unsubscribe
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
we have back-ported it to 2.2, on production by an average it has saved us
at least 2x time.
---
-
To unsubscribe, e-mail
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
@marmbrus can we target it for 2.4 ? need help on reviews. Been in waiting
state for very long
---
-
To unsubscribe, e-mail
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
@mallman do you foresee any issues ? planning to backport it to spark 2.2
on personal fork. will probably make jitpack release
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
@marmbrus can we start the review process ? so that it can make it for the
next release ?
---
-
To unsubscribe, e-mail
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
@DaimonPl branch 2.3 is already cut, so its at least not making to 2.3 :(
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
thank @mallman for rebasing each time. @gatorsmile can you take a look at
it ?
---
-
To unsubscribe, e-mail: reviews
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
just in case someone wants to try:
```
resolvers += "jitpack" at "https://jitpack.io;
libraryDependencies += "com.github.VideoAmp" % &quo
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
@mallman not sure how can I help to push it
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/17006
https://github.com/apache/spark/pull/16578 PR should solve this
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
Time in seconds
| Test| wPatch | w/o Patch
|
| - | - | - |
| count
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
I cannot do it. @marmbrus can you help with this pull request ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
thanks @mallman this helps a lot on performance. for 100 millions rows in a
partition we are able to go from 4.5 minute to 35 seconds with the patch. Will
share more results by end of the day
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
@mallman code looks good, I can test out things with my dataset. Have
deeply nested data, 20-30 nestings and millions of rows in a partition to test
performance. Will do it Monday
---
If your
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/14957
@saulshanabrook looks like #16578 is a superset, trying to invest in that
pull request.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16578
can I do something to help this pull request ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16842
thanks @srowen & @brkyvz
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
ena
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r104282408
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -193,9 +201,10 @@ class
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16842
@brkyvz Thank you for taking this forward. We have batch interval of 2
minutes & takes ~1.1 minutes to process. With older code it takes 10-12 minutes
to recover and with limit fix it reco
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r103502545
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -212,7 +214,7 @@ class
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16842
@srowen I assumed that you cannot update code if you want to recover from
checkpoint.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r102366702
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -204,10 +208,11 @@ class
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r102366680
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -36,7 +36,8 @@ import
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16842
@brkyvz can I do something to take it forward ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16842
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16842
will work on testcases today
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r99929314
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -36,7 +36,8 @@ import
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r99928115
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -36,7 +36,8 @@ import
Github user Gauravshah commented on a diff in the pull request:
https://github.com/apache/spark/pull/16842#discussion_r99927422
--- Diff:
external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisBackedBlockRDD.scala
---
@@ -36,7 +36,8 @@ import
GitHub user Gauravshah opened a pull request:
https://github.com/apache/spark/pull/16842
SPARK-19304 fix kinesis slow checkpoint recovery
## What changes were proposed in this pull request?
added a limit to getRecords api call call in KinesisBackedBlockRdd. This
helps reduce
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16213
facing same issue randomly on prod, can I help in some way to push it ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user Gauravshah commented on the issue:
https://github.com/apache/spark/pull/16339
should help us save 20 mins on each iteration scanning directories.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
34 matches
Mail list logo