Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19082
Basically, cutting is to decide the boundaries of `blocking loop`.
@kiszk and @rednaxelafx can explain what I said above better. This is
related to how JVM works and how whole-stage
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19082
> The regression caused by spark.sql.codegen.hugeMethodLimit shows the
potential regression caused by horizontal cuts, although
spark.sql.codegen.hugeMethodLimit does nothing.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19082
Btw, I'd like to know what the horizontal/vertical cuts you meant. Can you
give a simple example?
---
-
To unsubscribe, e-mail:
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19082
The regression caused by `spark.sql.codegen.hugeMethodLimit` shows the
potential regression caused by horizontal cuts, although
`spark.sql.codegen.hugeMethodLimit` does nothing.
---
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/18664
I'm sorry for the delay.
I agree with @HyukjinKwon's suggestion to keep the behavior of current
`toPandas` without Arrow for now.
---
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19082
I don't think `spark.sql.codegen.hugeMethodLimit` is the same level thing
as #18931 or this PR.
`hugeMethodLimit` didn't do anything to affect how generated codes are
split.
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19082
The current `spark.sql.codegen.hugeMethodLimit` shows an extreme case.
Just imagine we have two nodes, we want to do a horizontal/ring cut.
Basically, in this scenario, horizontal/ring
Github user heary-cao commented on the issue:
https://github.com/apache/spark/pull/19251
Leave a comment
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user heary-cao closed the pull request at:
https://github.com/apache/spark/pull/19251
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19082
@gatorsmile hmm, I don't know how you get to the conclusion. Is
`spark.sql.codegen.hugeMethodLimit` any related to codegen cut? I think it is
just a threshold used to determine whether to enable
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19444#discussion_r143386555
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
---
@@ -405,6 +405,11 @@ object CatalogTypes {
*
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19082
The latest regression (introduced by `spark.sql.codegen.hugeMethodLimit`)
clearly shows the ring/onion/horizontal cut
(https://github.com/apache/spark/pull/18931) could introduce a performance
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19449#discussion_r143385878
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -929,7 +929,7 @@ class
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19251#discussion_r143385665
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
---
@@ -32,12
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/18270
@cenyuhai Could you also address this comment:
https://github.com/apache/spark/pull/18270/files#r136121931?
---
-
To
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19077#discussion_r143380706
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -116,9 +116,10 @@ private [sql] object
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19287
**[Test build #82546 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82546/testReport)**
for PR 19287 at commit
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19287
Jenkins, retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user guoxiaolongzte closed the pull request at:
https://github.com/apache/spark/pull/19360
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19447
I feel it is a bit annoying to add a parameters for each Constant Pool
issue and we better look for solutions so that less parameters (e.g., other
metrics as @kiszk suggested) can almost solve the
Github user discipleforteen commented on the issue:
https://github.com/apache/spark/pull/19218
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19364
cc: @gatorsmile @cloud-fan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19419#discussion_r143377794
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -79,6 +79,9 @@ private[spark] object JettyUtils extends Logging {
val
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19419#discussion_r143377976
--- Diff: core/src/main/scala/org/apache/spark/ui/JettyUtils.scala ---
@@ -89,6 +92,9 @@ private[spark] object JettyUtils extends Logging {
Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19419#discussion_r143377740
--- Diff: conf/spark-defaults.conf.template ---
@@ -25,3 +25,10 @@
# spark.serializer
org.apache.spark.serializer.KryoSerializer
Github user liutang123 commented on the issue:
https://github.com/apache/spark/pull/19364
@maropu Any other suggestions and can this PR be merged?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user heary-cao commented on a diff in the pull request:
https://github.com/apache/spark/pull/19251#discussion_r143377744
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/SizeInBytesOnlyStatsPlanVisitor.scala
---
@@ -32,12
Github user guoxiaolongzte commented on the issue:
https://github.com/apache/spark/pull/19360
@HyukjinKwon The problem of the PR you follow, I do not care, I will close
this PR.
---
-
To unsubscribe, e-mail:
Github user cenyuhai commented on the issue:
https://github.com/apache/spark/pull/18270
@gatorsmile
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user rustagi commented on the issue:
https://github.com/apache/spark/pull/11205
Sorry haven't been able to confirm this patch becaus have not seen issue in
production for quite some time.
It was much more persistent with 2.0 than 2.1
Not sure of cause.
---
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/11205
I guess the issue still exists, let me verify the issue again, if it still
exists I will bring the PR to latest. Thanks!
---
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19082
@maropu Thanks. Then looks there isn't any significant regression brought
by this or #18931. We need to be careful but this numbers give more confidence.
---
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19082
ok, done (welcome any re-run requests);
```
OpenJDK 64-Bit Server VM 1.8.0_141-b16 on Linux 4.9.38-16.35.amzn1.x86_64
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
TPCDS Snappy:
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19082
just a sec, I'll re-run `q94` (sometimes, numbers fluctuate).
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19082
Thanks @maropu.
After counting accurate bytecode size, there seems a bottleneck in
generated codes in aggregation, so this can improve q66 a lot.
Overall, the numbers looks great,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19061
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19061
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82545/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19061
**[Test build #82545 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82545/testReport)**
for PR 19061 at commit
Github user rik-coenders closed the pull request at:
https://github.com/apache/spark/pull/18817
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user rik-coenders commented on the issue:
https://github.com/apache/spark/pull/18817
Unfortunately I do not have time to work on this issue at the moment, so I
will close this PR for now.
---
-
To
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18460
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18460
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82544/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18460
**[Test build #82544 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82544/testReport)**
for PR 18460 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19061
**[Test build #82545 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82545/testReport)**
for PR 19061 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19061
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/19061
Hi, @vanzin and @jerryshao .
Could you review this again when you have a chance? Thank you!
---
-
To unsubscribe,
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18460
**[Test build #82544 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82544/testReport)**
for PR 18460 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18460
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/18460
When you have a chance, could you review this please, @gatorsmile ?
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19456
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user blyncsy-david-lewis opened a pull request:
https://github.com/apache/spark/pull/19456
[SPARK] [Scheduler] Configurable default scheduling mode
Pulling default values for scheduling mode from spark conf.
You can merge this pull request into a Git repository by running:
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19443
I could consider going ahead if the small fix makes all the things in
`functions.py` consistent, but I guess it is not. I think I am less sure
because, IIUC, we are not even clear on what to do
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18270
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18270
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82543/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18270
**[Test build #82543 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82543/testReport)**
for PR 18270 at commit
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/18747
@cloud-fan could you please review this in my PRs at first?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user deeppark commented on the issue:
https://github.com/apache/spark/pull/19455
Hi All,
Apologies I did it by mistake. I'll try to close it.
Regards,
Deepak
On 8 Oct 2017 4:23 pm, "UCB AMPLab" wrote:
> Can
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/19082#discussion_r143359416
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala
---
@@ -797,26 +904,44 @@ case class HashAggregateExec(
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/19082
I checked the three pattens on `q66`;
```
q66
master 15960
master + pr18931 14226
master + pr19082 +
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19380
I'd close this for now. Optionally, we ask this case and discuss in the
mailing list if this is important.
---
-
To
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18270
**[Test build #82543 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82543/testReport)**
for PR 18270 at commit
Github user jsnowacki commented on the issue:
https://github.com/apache/spark/pull/19443
This PR fixes only the functions created using `_create_function`, which to
what I found, were the only ones affected by the issue. Rest of the functions
either have different assumption or
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19370#discussion_r143354349
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one
Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/19370#discussion_r143354306
--- Diff: bin/find-spark-home.cmd ---
@@ -0,0 +1,44 @@
+@echo off
+
+rem
+rem Licensed to the Apache Software Foundation (ASF) under one
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19454
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19454
**[Test build #82542 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82542/testReport)**
for PR 19454 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19454
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82542/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19454
**[Test build #82542 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82542/testReport)**
for PR 19454 at commit
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19369
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19369
Merged to master
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/19454
This is missing from Python and Java. It also doesn't bother to implement
this more efficiently than flatMap(identity). I am not sure this is worth while?
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19454
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82541/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19454
**[Test build #82541 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82541/testReport)**
for PR 19454 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19454
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19454
**[Test build #82541 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82541/testReport)**
for PR 19454 at commit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19454
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19455
@deeppark could you please close this if this is a PR that you did not
intend?
---
-
To unsubscribe, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19455
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18270
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82540/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/18270
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19449#discussion_r143351534
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -929,7 +929,7 @@ class CodegenContext {
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18270
**[Test build #82540 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82540/testReport)**
for PR 18270 at commit
GitHub user deeppark opened a pull request:
https://github.com/apache/spark/pull/19455
Branch 2.0
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/19454#discussion_r143351442
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2543,6 +2543,11 @@ class Dataset[T] private[sql](
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/19454
Could you please add test cases?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands,
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19438
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82539/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19438
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19438
**[Test build #82539 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82539/testReport)**
for PR 19438 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19454
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user sohum2002 opened a pull request:
https://github.com/apache/spark/pull/19454
Added flatten functions for RDD and Dataset
## What changes were proposed in this pull request?
This PR creates a _flatten_ function in two places: RDD and Dataset
classes. This PR resolves
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19389
ping?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/19419#discussion_r143349235
--- Diff: conf/spark-defaults.conf.template ---
@@ -25,3 +25,10 @@
# spark.serializer
org.apache.spark.serializer.KryoSerializer
#
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/18270
**[Test build #82540 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82540/testReport)**
for PR 18270 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19438
**[Test build #82539 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82539/testReport)**
for PR 19438 at commit
Github user wzhfy commented on the issue:
https://github.com/apache/spark/pull/19438
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user wzhfy commented on a diff in the pull request:
https://github.com/apache/spark/pull/19438#discussion_r143348208
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2738,7 +2738,7 @@ test_that("sampleBy() on a DataFrame", {
})
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19438
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19438
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82538/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19438
**[Test build #82538 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82538/testReport)**
for PR 19438 at commit
Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/19438#discussion_r143347310
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -2738,7 +2738,7 @@ test_that("sampleBy() on a DataFrame", {
})
100 matches
Mail list logo